본문 바로가기

AI/Paper - Theory

(76)

[OmniInsert 논문 리뷰] - Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models *OmniInsert를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! OmniInsert paper: https://phantom-video.github.io/OmniInsert/ OmniInsertMask-Free Video Insertion of Any Reference via Diffusion Transformer Models * Equal contribution, † Corresponding author, ‡ Project lead Intelligent Creation Lab, Bytedance Research Paper GitHubphantom-video.github.ioContents1. Simple Introduction2. Background Knowledge: Diffusi..

[FlowAlign 논문 리뷰] - Trajectory-Regularized, Inversion-Free Flow-based Image Editing *FlowAlign를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowAlign paper: https://arxiv.org/abs/2505.23145 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image EditingRecent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equatio..

[FlowDirector 논문 리뷰] - Training-Free Flow Steering for Precise Text-to-Video Editing *FlowDirector를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowDirector paper: https://arxiv.org/abs/2506.05046 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video EditingText-driven video editing aims to modify video content according to natural language instructions. While recent training-free approaches have made progress by leveraging pre-trained diffusion models, they typically rely o..

[FlowEdit 논문 리뷰] - Inversion-Free Text-Based Editing Using Pre-Trained Flow Models *FlowEdit를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowEdit paper: https://arxiv.org/abs/2412.08629 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow ModelsEditing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory..

[Rectified Flow 간단한 설명] - Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow *Rectified flow를 위한 간단한 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Rectified flow: https://arxiv.org/abs/2209.03003 Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified FlowWe present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π_0 and π_1, hence providing a u..

[InfEdit 논문 리뷰 + DDIM Inversion] - Inversion-Free Image Editing with Natural Language *InfEdit를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! InfEdit paper: https://arxiv.org/abs/2312.04965 Inversion-Free Image Editing with Natural LanguageDespite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency witha..

[CogVideoX 논문 리뷰] - Text-to-Video Diffusion Models with An Expert Transformer *CogVideoX를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! CogVideoX paper: https://arxiv.org/abs/2408.06072 CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerWe present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixel..

[TransPixar 논문 리뷰] - Advancing Text-to-Video Generation with Transparency *TransPixar를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! TransPixar paper: [2501.03006] TransPixar: Advancing Text-to-Video Generation with Transparency TransPixar: Advancing Text-to-Video Generation with TransparencyText-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alph..

[SigLip 논문 리뷰] - Sigmoid Loss for Language Image Pre-Training *SigLip를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! SigLip paper: https://arxiv.org/abs/2303.15343 Sigmoid Loss for Language Image Pre-TrainingWe propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise simarxiv.or..

[Skip-DiT 논문 리뷰] - Accelerating Vision Diffusion Transformers with Skip Branches *Skip-DiT를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Skip-DiT paper: https://arxiv.org/abs/2411.17616 Accelerating Vision Diffusion Transformers with Skip BranchesDiffusion Transformers (DiT), an emerging image and video generation model architecture, has demonstrated great potential because of its high generation quality and scalability properties. Despite the impressive performance, its practical depl..

[MoH 논문 리뷰] - MULTI-HEAD ATTENTION AS MIXTURE-OF-HEAD ATTENTION *MoH를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! MoH paper: [2410.11842] MoH: Multi-Head Attention as Mixture-of-Head Attention (arxiv.org) MoH: Multi-Head Attention as Mixture-of-Head AttentionIn this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to improve efficiency while maintaining or surpassing the previous accuracy level. We show that multi-head attentio..

[Dense Connector 논문 리뷰] - Dense Connector for MLLMs *Dense Connector를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Dense Connector paper: [2405.13800v1] Dense Connector for MLLMs (arxiv.org) Dense Connector for MLLMsDo we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the curre..

이전 1 2 3 4 ··· 7 다음

티스토리툴바