clip (9) 썸네일형 리스트형 [SigLip 논문 리뷰] - Sigmoid Loss for Language Image Pre-Training *SigLip를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! SigLip paper: https://arxiv.org/abs/2303.15343 Sigmoid Loss for Language Image Pre-TrainingWe propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise simarxiv.or.. [Dense Connector 논문 리뷰] - Dense Connector for MLLMs *Dense Connector를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! Dense Connector paper: [2405.13800v1] Dense Connector for MLLMs (arxiv.org) Dense Connector for MLLMsDo we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the curre.. [LLaVA-NeXT 논문 리뷰] - Improved Baselines with Visual Instruction Tuning *LLaVA-NeXT를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LLaVA-Next Github: https://github.com/LLaVA-VL/LLaVA-NeXT GitHub - LLaVA-VL/LLaVA-NeXTContribute to LLaVA-VL/LLaVA-NeXT development by creating an account on GitHub.github.com LLaVA-1.5 paper: https://arxiv.org/abs/2310.03744LLaVA-Next (1.6) blog: https://llava-vl.github.io/blog/2024-01-30-llava-next/Contents1. Simple Introduction2. Background Knowl.. [LLaVA 논문 리뷰] - Visual Instruction Tuning *LLaVA를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LLaVA github: https://llava-vl.github.io/ LLaVABased on the COCO dataset, we interact with language-only GPT-4, and collect 158K unique language-image instruction-following samples in total, including 58K in conversations, 23K in detailed description, and 77k in complex reasoning, respectively. Pleasellava-vl.github.ioContents1. Simple Introduction2. Ba.. [KO-stable-diffusion-anything] - 한국어 기반의 stable-diffusion-disney와 KO-anything-v4-5 Github: https://github.com/KyujinHan/KO-stable-diffusion-anything GitHub - KyujinHan/KO-stable-diffusion-anything: Diffusion-based korean text-to-image generation model Diffusion-based korean text-to-image generation model - GitHub - KyujinHan/KO-stable-diffusion-anything: Diffusion-based korean text-to-image generation model github.com KO-anything-v4-5: https://huggingface.co/kyujinpy/KO-anythi.. [DietNeRF 논문 리뷰] - Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis *DietNeRF를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! DietNeRF paper: [2104.00677] Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis (arxiv.org) Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis We present DietNeRF, a 3D neural scene representation estimated from a few images. Neural Radiance Fields (NeRF) learn a continuous volumetric representation of a scene.. [CLIP-NeRF 논문 리뷰] - CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields *해당 글은 CLIP-NeRF 논문 리뷰를 위한 글입니다. 궁금하신 점은 댓글로 남겨주세요! CLIP-NeRF paper: [2112.05139] CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields (arxiv.org) CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields We present CLIP-NeRF, a multi-modal 3D object manipulation method for neural radiance fields (NeRF). By leveraging the joint language-image embedding space of t.. [NeRF-Art 논문 리뷰] - Text-Driven Neural Radiance Fields Stylization *NeRF-Art 논문 리뷰 글입니다! 궁금하신 점이 있다면 댓글로 남겨주세요! NeRF-Art paper: [2212.08070] NeRF-Art: Text-Driven Neural Radiance Fields Stylization (arxiv.org) NeRF-Art: Text-Driven Neural Radiance Fields Stylization As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially.. [CLIP 논문 리뷰] - Learning Transferable Visual Models From Natural Language Supervision *CLIP 논문 리뷰를 위한 글입니다. 질문이 있다면 댓글로 남겨주시길 바랍니다! CLIP paper: [2103.00020] Learning Transferable Visual Models From Natural Language Supervision (arxiv.org) Learning Transferable Visual Models From Natural Language Supervision State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and.. 이전 1 다음