본문 바로가기

AI

(119)

[OmniInsert 논문 리뷰] - Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models *OmniInsert를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! OmniInsert paper: https://phantom-video.github.io/OmniInsert/ OmniInsertMask-Free Video Insertion of Any Reference via Diffusion Transformer Models * Equal contribution, † Corresponding author, ‡ Project lead Intelligent Creation Lab, Bytedance Research Paper GitHubphantom-video.github.ioContents1. Simple Introduction2. Background Knowledge: Diffusi..

[WANAlign2.1⚡- Awesome-Training-Free-WAN2.1-Editing] WANAlign2.1⚡ is released!!Awesome-Training-Free Video Editing Open Source Project with WAN2.1@!!Awensome-OpenSource!!! WANAlign2.1⚡ github: https://github.com/KyujinHan/Awesome-Training-Free-WAN2.1-Editing GitHub - KyujinHan/Awesome-Training-Free-WAN2.1-Editing: Training-Free (Inversion-Free) methods meet WAN2.1-T2VTraining-Free (Inversion-Free) methods meet WAN2.1-T2V - KyujinHan/Awesome-Traini..

[FlowAlign 논문 리뷰] - Trajectory-Regularized, Inversion-Free Flow-based Image Editing *FlowAlign를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowAlign paper: https://arxiv.org/abs/2505.23145 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image EditingRecent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equatio..

[FlowDirector 논문 리뷰] - Training-Free Flow Steering for Precise Text-to-Video Editing *FlowDirector를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowDirector paper: https://arxiv.org/abs/2506.05046 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video EditingText-driven video editing aims to modify video content according to natural language instructions. While recent training-free approaches have made progress by leveraging pre-trained diffusion models, they typically rely o..

[FlowEdit 논문 리뷰] - Inversion-Free Text-Based Editing Using Pre-Trained Flow Models *FlowEdit를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowEdit paper: https://arxiv.org/abs/2412.08629 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow ModelsEditing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory..

[Rectified Flow 간단한 설명] - Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow *Rectified flow를 위한 간단한 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Rectified flow: https://arxiv.org/abs/2209.03003 Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified FlowWe present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π_0 and π_1, hence providing a u..

[KO-VQA 벤치마크 제작기🤗] 시각화자료질의응답 데이터셋을 활용한 한국어 VLM 능력 평가 벤치마크 KO-VQA Benchmark GithubGithub: https://github.com/Marker-Inc-Korea/KO-VQA-Benchmark GitHub - Marker-Inc-Korea/KO-VQA-Benchmark: AIHUB 시각화자료질의응답 데이터셋을 기반으로 만든 VLM 벤치마AIHUB 시각화자료질의응답 데이터셋을 기반으로 만든 VLM 벤치마크 데이터셋. Contribute to Marker-Inc-Korea/KO-VQA-Benchmark development by creating an account on GitHub.github.comIntroduction😋한국어 문서 기반 VLM 능력을 평가하기 위한 KO-VQA 벤치마크 제작기🔥안녕하세요! 어느덧 2025년의 절반도 지나가 무더..

[InfEdit 논문 리뷰 + DDIM Inversion] - Inversion-Free Image Editing with Natural Language *InfEdit를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! InfEdit paper: https://arxiv.org/abs/2312.04965 Inversion-Free Image Editing with Natural LanguageDespite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency witha..

[CogVideoX 논문 리뷰] - Text-to-Video Diffusion Models with An Expert Transformer *CogVideoX를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! CogVideoX paper: https://arxiv.org/abs/2408.06072 CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerWe present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixel..

[Gukbap-LMM🍚] - 오직 텍스트 데이터셋만으로 한국어 기반 LMM 제작하기 Gukbap-LMM Series Models HumanF-MarkrAI/Gukbap-Gemma2-9B-VL🍚: https://huggingface.co/HumanF-MarkrAI/Gukbap-Gemma2-9B-VL HumanF-MarkrAI/Gukbap-Qwen2-34B-VL🍚: https://huggingface.co/HumanF-MarkrAI/Gukbap-Qwen2-34B-VLGukbap-LMM Training Code☀️Training Code (github): https://github.com/Marker-Inc-Korea/Ovis2-FFT-Korean GitHub - Marker-Inc-Korea/Ovis2-FFT-Korean: Korean Large MultiModal FFT CodeKor..

[TransPixar 논문 리뷰] - Advancing Text-to-Video Generation with Transparency *TransPixar를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! TransPixar paper: [2501.03006] TransPixar: Advancing Text-to-Video Generation with Transparency TransPixar: Advancing Text-to-Video Generation with TransparencyText-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alph..

[SigLip 논문 리뷰] - Sigmoid Loss for Language Image Pre-Training *SigLip를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! SigLip paper: https://arxiv.org/abs/2303.15343 Sigmoid Loss for Language Image Pre-TrainingWe propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise simarxiv.or..

이전 1 2 3 4 ··· 10 다음

티스토리툴바