본문 바로가기

Attention

(5)
[MoH 논문 리뷰] - MULTI-HEAD ATTENTION AS MIXTURE-OF-HEAD ATTENTION *MoH를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요!  MoH paper: [2410.11842] MoH: Multi-Head Attention as Mixture-of-Head Attention (arxiv.org)  MoH: Multi-Head Attention as Mixture-of-Head AttentionIn this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to improve efficiency while maintaining or surpassing the previous accuracy level. We show that multi-head attentio..
[Diffusion Transformer 논문 리뷰2] - High-Resolution Image Synthesis with Latent Diffusion Models *DiT를 한번에 이해할 수 있는(?) A~Z 논문리뷰입니다! *총 3편으로 구성되었고, 2편은 DiT를 이해하기 위하여 LDM를 논문리뷰를 진행합니다! *궁금하신 점은 댓글로 남겨주세요! DiT paper: https://arxiv.org/abs/2212.09748 Scalable Diffusion Models with Transformers We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on..
AttributeError: 'GatedCrossAttentionBlock' object has no attribute 'clip_grad_norm_' 에러가 일어난 곳: https://github.com/mlfoundations/open_flamingo/tree/main/open_flamingo/train Openflamingo model을 nccl없이 gloo만 해서 훈련하고자 노력중인데 해당 에러를 만났다. 어떻게 해결해야할지 2시간을 헤매다가 해결방법을 찾았다! Method: torch framework의 clip_grad_norm_ 이용하기 Flamingo class이다. 해당 class에서 밑의 코드와 같은 부분을 찾을 수 있다! # set up clip_grad_norm_ function def clip_grad_norm_(max_norm): self.perceiver.clip_grad_norm_(max_norm) for layer in s..
[DAE-Former 논문 리뷰] - DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation *DAE-Former를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! DAE-Former paper: [2212.13504] DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation (arxiv.org) DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. Howev..
[Transformer 논문 리뷰] - Attention is All You Need (2017) *Transformer 논문 리뷰를 위한 글이고, 질문이 있으시다면 언제든지 댓글로 남겨주세요! Transformer paper: https://arxiv.org/abs/1706.03762 Attention Is All You Need The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new arxiv.org ..

반응형