본문 바로가기

AI/Paper - Theory

(66)
[MoH 논문 리뷰] - MULTI-HEAD ATTENTION AS MIXTURE-OF-HEAD ATTENTION *MoH를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요!  MoH paper: [2410.11842] MoH: Multi-Head Attention as Mixture-of-Head Attention (arxiv.org)  MoH: Multi-Head Attention as Mixture-of-Head AttentionIn this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to improve efficiency while maintaining or surpassing the previous accuracy level. We show that multi-head attentio..
[Dense Connector 논문 리뷰] - Dense Connector for MLLMs *Dense Connector를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Dense Connector paper: [2405.13800v1] Dense Connector for MLLMs (arxiv.org)  Dense Connector for MLLMsDo we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the curre..
[LLaVA-Video 논문 리뷰] - VIDEO INSTRUCTION TUNING WITH SYNTHETIC DATA *LLaVA-Video를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LLaVA-Video paper: https://arxiv.org/abs/2410.02713 Video Instruction Tuning With Synthetic DataThe development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset ..
[LLaVA-OneVision 논문 리뷰] - LLaVA-OneVision: Easy Visual Task Transfer *LLaVA-OneVision를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LLaVA-OneVision paper: https://arxiv.org/abs/2408.03326 LLaVA-OneVision: Easy Visual Task TransferWe present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVisi..
[LLaVA-NeXT 논문 리뷰] - Improved Baselines with Visual Instruction Tuning *LLaVA-NeXT를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LLaVA-Next Github: https://github.com/LLaVA-VL/LLaVA-NeXT GitHub - LLaVA-VL/LLaVA-NeXTContribute to LLaVA-VL/LLaVA-NeXT development by creating an account on GitHub.github.com LLaVA-1.5 paper: https://arxiv.org/abs/2310.03744LLaVA-Next (1.6) blog: https://llava-vl.github.io/blog/2024-01-30-llava-next/Contents1. Simple Introduction2. Background Knowl..
[LLaVA 논문 리뷰] - Visual Instruction Tuning *LLaVA를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요!  LLaVA github: https://llava-vl.github.io/ LLaVABased on the COCO dataset, we interact with language-only GPT-4, and collect 158K unique language-image instruction-following samples in total, including 58K in conversations, 23K in detailed description, and 77k in complex reasoning, respectively. Pleasellava-vl.github.ioContents1. Simple Introduction2. Ba..
[MeshAnything 논문 리뷰] - MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers *MeshAnything를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! MeshAnything paper: https://arxiv.org/abs/2406.10163 MeshAnything: Artist-Created Mesh Generation with Autoregressive TransformersRecently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because thes..
[Mamba 논문 리뷰 5] - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model *Mamba 논문 리뷰 시리즈5 입니다! 궁금하신 점은 댓글로 남겨주세요!시리즈 1: Hippo시리즈 2: LSSL시리즈 3: S4시리즈 4: Mamba시리즈 5: Vision MambaVision Mamba paper: [2401.09417] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model (arxiv.org)  Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space ModelRecently the state space models (SSMs) with efficient hardware-aw..
[Mamba 논문 리뷰 4] - Mamba: Linear-Time Sequence Modeling with Selective State Spaces *Mamba 논문 리뷰 시리즈4 입니다! 궁금하신 점은 댓글로 남겨주세요!시리즈 1: Hippo시리즈 2: LSSL시리즈 3: S4시리즈 4: Mamba시리즈 5: Vision MambaMamba paper: https://arxiv.org/abs/2312.00752 Mamba: Linear-Time Sequence Modeling with Selective State SpacesFoundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many sub..
[Mamba 논문 리뷰 3] - S4: Efficiently Modeling Long Sequences with Structured State Spaces *Mamba 논문 리뷰 시리즈3 입니다! 궁금하신 점은 댓글로 남겨주세요!시리즈 1: Hippo시리즈 2: LSSL시리즈 3: S4시리즈 4: Mamba시리즈 5: Vision MambaS4 paper: [2111.00396] Efficiently Modeling Long Sequences with Structured State Spaces (arxiv.org)  Efficiently Modeling Long Sequences with Structured State SpacesA central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modal..
[Mamba 논문 리뷰 2] - LSSL: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers *Mamba 논문 리뷰 시리즈2 입니다! 궁금하신 점은 댓글로 남겨주세요!시리즈 1: Hippo시리즈 2: LSSL시리즈 3: S4시리즈 4: Mamba시리즈 5: Vision MambaLSSL paper: [2110.13985] Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers (arxiv.org)  Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space LayersRecurrent neural networks (RNNs), temporal convolutions, and neural d..
[Mamba 논문 리뷰 1] - HiPPO: Recurrent Memory with Optimal Polynomial Projections *Mamba 논문 리뷰 시리즈1 입니다! 궁금하신 점은 댓글로 남겨주세요!시리즈 1: Hippo시리즈 2: LSSL시리즈 3: S4시리즈 4: Mamba시리즈 5: Vision MambaHiPPO paper: https://arxiv.org/abs/2008.07669 HiPPO: Recurrent Memory with Optimal Polynomial ProjectionsA central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the o..

반응형