본문 바로가기

image

(9)
[SigLip 논문 리뷰] - Sigmoid Loss for Language Image Pre-Training *SigLip를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! SigLip paper: https://arxiv.org/abs/2303.15343 Sigmoid Loss for Language Image Pre-TrainingWe propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise simarxiv.or..
[LGM 논문 리뷰] Large Multi-View Gaussian Model for High-Resolution 3D Content Creation *LGM를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LGM github: LGM (kiui.moe)  LGMLGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation Arxiv 2024 Jiaxiang Tang1, Zhaoxi Chen2, Xiaokang Chen1, Tengfei Wang3, Gang Zeng1, Ziwei Liu2 1 Peking University   2 S-Lab, Nanyang Technological University   3 Shanghai AI Lame.kiui.moeContents1. Simple Introduction2. Background Knowledge: Gaussia..
[LRM 논문 리뷰] - LARGE RECONSTRUCTION MODEL FOR SINGLE IMAGE TO 3D *LRM를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! LRM paper: https://arxiv.org/abs/2311.04400 LRM: Large Reconstruction Model for Single Image to 3DWe propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specarxi..
[SORA 설명] - OpenAI의 Video Generation AI (기술부분 번역 + 설명 이미지 추가) Technical Report: Video generation models as world simulators (openai.com) Video generation models as world simulators We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that oper openai.com SORA: https..
[ControlNet 논문 리뷰] - Adding Conditional Control to Text-to-Image Diffusion Models *ControlNet를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! ControlNet paper: [2302.05543] Adding Conditional Control to Text-to-Image Diffusion Models (arxiv.org) Adding Conditional Control to Text-to-Image Diffusion Models We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large..
[OpenFlaminKO] - Polyglot-KO를 활용한 한국어 기반 MultiModal 도전기! Github: https://github.com/Marker-Inc-Korea/OpenFlaminKO OpenFlamingo: https://github.com/mlfoundations/open_flamingo GitHub - mlfoundations/open_flamingo: An open-source framework for training large multimodal models. An open-source framework for training large multimodal models. - GitHub - mlfoundations/open_flamingo: An open-source framework for training large multimodal models. github.com Op..
[MCCNet 논문 리뷰] - Arbitrary Video Style Transfer via Multi-Channel Correlation *MCCNet를 위한 논문 리뷰 글입니다! 궁금하신 점은 댓글로 남겨주세요! MCCNet paper: [2009.08003] Arbitrary Video Style Transfer via Multi-Channel Correlation (arxiv.org) Arbitrary Video Style Transfer via Multi-Channel Correlation Video style transfer is getting more attention in AI community for its numerous applications such as augmented reality and animation productions. Compared with traditional image style transfer, ..
[CLIP 논문 리뷰] - Learning Transferable Visual Models From Natural Language Supervision *CLIP 논문 리뷰를 위한 글입니다. 질문이 있다면 댓글로 남겨주시길 바랍니다! CLIP paper: [2103.00020] Learning Transferable Visual Models From Natural Language Supervision (arxiv.org) Learning Transferable Visual Models From Natural Language Supervision State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and..
Introduction to Linear Transformation * 본 글은 선형대수학 복습을 상기시키기 위한 글로, 설명이 매우 부족할(?) 수 있습니다. Image: T(x) Range: 치역 Domain: 정의역 Codomain: 공역 Linear Transformation의 중요한 특징!!! 1. T(u+v) = T(u) + T(v) 2. T(cu) = cT(u) 를 만족해야 한다.

반응형