본문 바로가기

AI

(118)
[WANAlign2.1⚡- Awesome-Training-Free-WAN2.1-Editing] WANAlign2.1⚡ is released!!Awesome-Training-Free Video Editing Open Source Project with WAN2.1@!!Awensome-OpenSource!!! WANAlign2.1⚡ github: https://github.com/KyujinHan/Awesome-Training-Free-WAN2.1-Editing GitHub - KyujinHan/Awesome-Training-Free-WAN2.1-Editing: Training-Free (Inversion-Free) methods meet WAN2.1-T2VTraining-Free (Inversion-Free) methods meet WAN2.1-T2V - KyujinHan/Awesome-Traini..
[FlowAlign 논문 리뷰] - Trajectory-Regularized, Inversion-Free Flow-based Image Editing *FlowAlign를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowAlign paper: https://arxiv.org/abs/2505.23145 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image EditingRecent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equatio..
[FlowDirector 논문 리뷰] - Training-Free Flow Steering for Precise Text-to-Video Editing *FlowDirector를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowDirector paper: https://arxiv.org/abs/2506.05046 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video EditingText-driven video editing aims to modify video content according to natural language instructions. While recent training-free approaches have made progress by leveraging pre-trained diffusion models, they typically rely o..
[FlowEdit 논문 리뷰] - Inversion-Free Text-Based Editing Using Pre-Trained Flow Models *FlowEdit를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! FlowEdit paper: https://arxiv.org/abs/2412.08629 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow ModelsEditing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory..
[Rectified Flow 간단한 설명] - Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow *Rectified flow를 위한 간단한 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Rectified flow: https://arxiv.org/abs/2209.03003 Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified FlowWe present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π_0 and π_1, hence providing a u..
[KO-VQA 벤치마크 제작기🤗] 시각화자료질의응답 데이터셋을 활용한 한국어 VLM 능력 평가 벤치마크 KO-VQA Benchmark GithubGithub: https://github.com/Marker-Inc-Korea/KO-VQA-Benchmark GitHub - Marker-Inc-Korea/KO-VQA-Benchmark: AIHUB 시각화자료질의응답 데이터셋을 기반으로 만든 VLM 벤치마AIHUB 시각화자료질의응답 데이터셋을 기반으로 만든 VLM 벤치마크 데이터셋. Contribute to Marker-Inc-Korea/KO-VQA-Benchmark development by creating an account on GitHub.github.comIntroduction😋한국어 문서 기반 VLM 능력을 평가하기 위한 KO-VQA 벤치마크 제작기🔥안녕하세요! 어느덧 2025년의 절반도 지나가 무더..
[InfEdit 논문 리뷰 + DDIM Inversion] - Inversion-Free Image Editing with Natural Language *InfEdit를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! InfEdit paper: https://arxiv.org/abs/2312.04965 Inversion-Free Image Editing with Natural LanguageDespite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency witha..
[CogVideoX 논문 리뷰] - Text-to-Video Diffusion Models with An Expert Transformer *CogVideoX를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! CogVideoX paper: https://arxiv.org/abs/2408.06072 CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerWe present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixel..
[Gukbap-LMM🍚] - 오직 텍스트 데이터셋만으로 한국어 기반 LMM 제작하기 Gukbap-LMM Series Models HumanF-MarkrAI/Gukbap-Gemma2-9B-VL🍚: https://huggingface.co/HumanF-MarkrAI/Gukbap-Gemma2-9B-VL HumanF-MarkrAI/Gukbap-Qwen2-34B-VL🍚: https://huggingface.co/HumanF-MarkrAI/Gukbap-Qwen2-34B-VLGukbap-LMM Training Code☀️Training Code (github): https://github.com/Marker-Inc-Korea/Ovis2-FFT-Korean GitHub - Marker-Inc-Korea/Ovis2-FFT-Korean: Korean Large MultiModal FFT CodeKor..
[TransPixar 논문 리뷰] - Advancing Text-to-Video Generation with Transparency *TransPixar를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! TransPixar paper: [2501.03006] TransPixar: Advancing Text-to-Video Generation with Transparency  TransPixar: Advancing Text-to-Video Generation with TransparencyText-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alph..
[SigLip 논문 리뷰] - Sigmoid Loss for Language Image Pre-Training *SigLip를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! SigLip paper: https://arxiv.org/abs/2303.15343 Sigmoid Loss for Language Image Pre-TrainingWe propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise simarxiv.or..
[Skip-DiT 논문 리뷰] - Accelerating Vision Diffusion Transformers with Skip Branches *Skip-DiT를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Skip-DiT paper: https://arxiv.org/abs/2411.17616 Accelerating Vision Diffusion Transformers with Skip BranchesDiffusion Transformers (DiT), an emerging image and video generation model architecture, has demonstrated great potential because of its high generation quality and scalability properties. Despite the impressive performance, its practical depl..

반응형