본문 바로가기

Anything else

[추후 논문 리뷰 paper 정리] - 계속 업데이트



1. Segment Anything: https://ai.facebook.com/research/publications/segment-anything/


Segment Anything | Meta AI Research

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11



2. InstructGPT: https://arxiv.org/abs/2203.02155


Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not ali



3. DDPM(Denoising Diffusion Probabilistic Models): https://arxiv.org/abs/2006.11239


Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound



4. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection: https://paperswithcode.com/paper/dino-detr-with-improved-denoising-anchor-1


Papers with Code - DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

🏆 SOTA for Object Detection on COCO 2017 val (box AP metric)



5. LLaMA: Open and Efficient Foundation Language Models: https://paperswithcode.com/paper/llama-open-and-efficient-foundation-language-1


Papers with Code - LLaMA: Open and Efficient Foundation Language Models

🏆 SOTA for Question Answering on PIQA (Accuracy metric)



6. Hypernetworks: https://arxiv.org/abs/1609.09106



This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a gen



7. PET-Neus: https://paperswithcode.com/paper/pet-neus-positional-encoding-tri-planes-for


Papers with Code - PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces

Implemented in one code library.



8. LoRA: [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models (arxiv.org)


LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes le


9. SAM high quality: https://paperswithcode.com/paper/segment-anything-in-high-quality


Papers with Code - Segment Anything in High Quality

Implemented in one code library.


10. QLoRA:https://arxiv.org/abs/2305.14314


QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quan


11. LightGlue: https://paperswithcode.com/paper/lightglue-local-feature-matching-at-light


Papers with Code - LightGlue: Local Feature Matching at Light Speed

Implemented in 2 code libraries.


12. DragGAN: https://github.com/XingangPan/DragGAN 


GitHub - XingangPan/DragGAN: Official Code for DragGAN (SIGGRAPH 2023)

Official Code for DragGAN (SIGGRAPH 2023). Contribute to XingangPan/DragGAN development by creating an account on GitHub.



13. SDXL: https://github.com/stability-ai/generative-models


GitHub - Stability-AI/generative-models: Generative Models by Stability AI

Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.


14. TAV: https://github.com/showlab/Tune-A-Video


GitHub - showlab/Tune-A-Video: [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation - GitHub - showlab/Tune-A-Video: [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models...


15. CoDeF: https://paperswithcode.com/paper/codef-content-deformation-fields-for


Papers with Code - CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Implemented in one code library.


16. TF-ICON: https://paperswithcode.com/paper/tf-icon-diffusion-based-training-free-cross


Papers with Code - TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Implemented in one code library.


17. Point-Bind & Point-LLM: https://arxiv.org/pdf/2309.00615.pdf

19. DEVA: https://paperswithcode.com/paper/tracking-anything-with-decoupled-video


Papers with Code - Tracking Anything with Decoupled Video Segmentation

🏆 SOTA for Unsupervised Video Object Segmentation on DAVIS 2016 val (G metric)


20. Vote2Cap: https://github.com/ch3cook-fdu/vote2cap-detr


GitHub - ch3cook-fdu/Vote2Cap-DETR: Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023)

Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023) - GitHub - ch3cook-fdu/Vote2Cap-DETR: Code release for ''End-to-End 3D Dense Captioning wit...


21. InstaFlow: https://github.com/gnobitab/instaflow


GitHub - gnobitab/InstaFlow: :zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow

:zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow - GitHub - gnobitab/InstaFlow: :zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow



22. DreamGaussian: https://paperswithcode.com/paper/dreamgaussian-generative-gaussian-splatting


Papers with Code - DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Implemented in one code library.


23. Propainter: https://github.com/sczhou/propainter


GitHub - sczhou/ProPainter: [ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting - GitHub - sczhou/ProPainter: [ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting


24. 3D gaussian splatting: https://arxiv.org/pdf/2308.04079.pdf

25. MetaClip: https://paperswithcode.com/paper/demystifying-clip-data


Papers with Code - Demystifying CLIP Data

Implemented in one code library.


26. From CLIP to DINO: https://paperswithcode.com/paper/from-clip-to-dino-visual-encoders-shout-in 


Papers with Code - From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Implemented in one code library.


27. NEFTune: https://github.com/neelsjain/neftune 


GitHub - neelsjain/NEFTune: Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning - GitHub - neelsjain/NEFTune: Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning


28. Cutie: https://github.com/hkchengrex/Cutie


GitHub - hkchengrex/Cutie: [arXiv 2023] Putting the Object Back Into Video Object Segmentation

[arXiv 2023] Putting the Object Back Into Video Object Segmentation - GitHub - hkchengrex/Cutie: [arXiv 2023] Putting the Object Back Into Video Object Segmentation


29. PALI: https://github.com/kyegomez/PALI3


GitHub - kyegomez/PALI3: Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER" - GitHub - kyegomez/PALI3: Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODEL...


30. Lion: https://github.com/lucidrains/lion-pytorch


GitHub - lucidrains/lion-pytorch: 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purported

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch - GitHub - lucidrains/lion-pytorch: 🦁 Lion, new optimizer discovered by...


31. Mustango: https://github.com/amaai-lab/mustango


GitHub - AMAAI-Lab/mustango: Mustango: Toward Controllable Text-to-Music Generation

Mustango: Toward Controllable Text-to-Music Generation - GitHub - AMAAI-Lab/mustango: Mustango: Toward Controllable Text-to-Music Generation


32. OneLLM: https://paperswithcode.com/paper/onellm-one-framework-to-align-all-modalities 


Papers with Code - OneLLM: One Framework to Align All Modalities with Language

Implemented in one code library.


33. Alpha CLIP:https://github.com/sunzey/alphaclip


GitHub - SunzeY/AlphaCLIP: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want - GitHub - SunzeY/AlphaCLIP: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want


34. Ferret(LMM): https://github.com/apple/ml-ferret


GitHub - apple/ml-ferret

Contribute to apple/ml-ferret development by creating an account on GitHub.


35. DreamGaussian4D: https://github.com/jiawei-ren/dreamgaussian4d


GitHub - jiawei-ren/dreamgaussian4d: [arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting

[arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting - GitHub - jiawei-ren/dreamgaussian4d: [arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting


36. Affusion: https://github.com/happylittlecat2333/Auffusion


GitHub - happylittlecat2333/Auffusion: Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation" - GitHub - happylittlecat2333/Auffusion: Offic...


37. InstantID: https://paperswithcode.com/paper/instantid-zero-shot-identity-preserving


Papers with Code - InstantID: Zero-shot Identity-Preserving Generation in Seconds

Implemented in one code library.


38. https://paperswithcode.com/paper/scalable-diffusion-models-with-state-space


Papers with Code - Scalable Diffusion Models with State Space Backbone

Implemented in one code library.


39. https://paperswithcode.com/paper/the-boundary-of-neural-network-trainability


Papers with Code - The boundary of neural network trainability is fractal

Implemented in one code library.


40. RoSA: https://github.com/ist-daslab/rosa?tab=readme-ov-file


GitHub - IST-DASLab/RoSA

Contribute to IST-DASLab/RoSA development by creating an account on GitHub.


41. VMamba: https://paperswithcode.com/paper/vmamba-visual-state-space-model


42. Latte: https://github.com/Vchitect/Latte


GitHub - Vchitect/Latte: Latte: Latent Diffusion Transformer for Video Generation.

Latte: Latent Diffusion Transformer for Video Generation. - Vchitect/Latte


43. SuGaR: https://github.com/Anttwo/SuGaR


GitHub - Anttwo/SuGaR: Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Recons

Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering (CVPR 2024) - GitHub - Anttwo/SuGaR: Official PyTo...


44. TripoSR: https://paperswithcode.com/paper/triposr-fast-3d-object-reconstruction-from-a


Papers with Code - TripoSR: Fast 3D Object Reconstruction from a Single Image

Implemented in one code library.


45. ViewDiff: https://github.com/facebookresearch/viewdiff?tab=readme-ov-file


GitHub - facebookresearch/ViewDiff: ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in a

ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings. (CVPR2024). - facebookresearch/ViewDiff


46. InstantStyle: https://github.com/instantstyle/instantstyle


GitHub - InstantStyle/InstantStyle: InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥 - InstantStyle/InstantStyle


