본문 바로가기

Anything else

[추후 논문 리뷰 paper 정리] - 계속 업데이트

반응형

2023.05.06

1. Segment Anything: https://ai.facebook.com/research/publications/segment-anything/

 

Segment Anything | Meta AI Research

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11

ai.facebook.com

 

2. InstructGPT: https://arxiv.org/abs/2203.02155

 

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not ali

arxiv.org

 

3. DDPM(Denoising Diffusion Probabilistic Models): https://arxiv.org/abs/2006.11239

 

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound

arxiv.org

 

4. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection: https://paperswithcode.com/paper/dino-detr-with-improved-denoising-anchor-1

 

Papers with Code - DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

🏆 SOTA for Object Detection on COCO 2017 val (box AP metric)

paperswithcode.com

 

5. LLaMA: Open and Efficient Foundation Language Models: https://paperswithcode.com/paper/llama-open-and-efficient-foundation-language-1

 

Papers with Code - LLaMA: Open and Efficient Foundation Language Models

🏆 SOTA for Question Answering on PIQA (Accuracy metric)

paperswithcode.com


2023.05.16

6. Hypernetworks: https://arxiv.org/abs/1609.09106

 

HyperNetworks

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a gen

arxiv.org

 

7. PET-Neus: https://paperswithcode.com/paper/pet-neus-positional-encoding-tri-planes-for

 

Papers with Code - PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces

Implemented in one code library.

paperswithcode.com


2023.05.17

8. LoRA: [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models (arxiv.org)

 

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes le

arxiv.org


9. SAM high quality: https://paperswithcode.com/paper/segment-anything-in-high-quality

 

Papers with Code - Segment Anything in High Quality

Implemented in one code library.

paperswithcode.com


10. QLoRA:https://arxiv.org/abs/2305.14314

 

QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quan

arxiv.org


11. LightGlue: https://paperswithcode.com/paper/lightglue-local-feature-matching-at-light

 

Papers with Code - LightGlue: Local Feature Matching at Light Speed

Implemented in 2 code libraries.

paperswithcode.com


12. DragGAN: https://github.com/XingangPan/DragGAN 

 

GitHub - XingangPan/DragGAN: Official Code for DragGAN (SIGGRAPH 2023)

Official Code for DragGAN (SIGGRAPH 2023). Contribute to XingangPan/DragGAN development by creating an account on GitHub.

github.com


2023.08.05

13. SDXL: https://github.com/stability-ai/generative-models

 

GitHub - Stability-AI/generative-models: Generative Models by Stability AI

Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.

github.com


14. TAV: https://github.com/showlab/Tune-A-Video

 

GitHub - showlab/Tune-A-Video: [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation - GitHub - showlab/Tune-A-Video: [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models...

github.com

15. CoDeF: https://paperswithcode.com/paper/codef-content-deformation-fields-for

 

Papers with Code - CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Implemented in one code library.

paperswithcode.com


16. TF-ICON: https://paperswithcode.com/paper/tf-icon-diffusion-based-training-free-cross

 

Papers with Code - TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Implemented in one code library.

paperswithcode.com


17. Point-Bind & Point-LLM: https://arxiv.org/pdf/2309.00615.pdf


19. DEVA: https://paperswithcode.com/paper/tracking-anything-with-decoupled-video

 

Papers with Code - Tracking Anything with Decoupled Video Segmentation

🏆 SOTA for Unsupervised Video Object Segmentation on DAVIS 2016 val (G metric)

paperswithcode.com

20. Vote2Cap: https://github.com/ch3cook-fdu/vote2cap-detr

 

GitHub - ch3cook-fdu/Vote2Cap-DETR: Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023)

Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023) - GitHub - ch3cook-fdu/Vote2Cap-DETR: Code release for ''End-to-End 3D Dense Captioning wit...

github.com


21. InstaFlow: https://github.com/gnobitab/instaflow

 

GitHub - gnobitab/InstaFlow: :zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow

:zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow - GitHub - gnobitab/InstaFlow: :zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow

github.com


2023.10.02

22. DreamGaussian: https://paperswithcode.com/paper/dreamgaussian-generative-gaussian-splatting

 

Papers with Code - DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Implemented in one code library.

paperswithcode.com

23. Propainter: https://github.com/sczhou/propainter

 

GitHub - sczhou/ProPainter: [ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting - GitHub - sczhou/ProPainter: [ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

github.com

24. 3D gaussian splatting: https://arxiv.org/pdf/2308.04079.pdf


25. MetaClip: https://paperswithcode.com/paper/demystifying-clip-data

 

Papers with Code - Demystifying CLIP Data

Implemented in one code library.

paperswithcode.com


26. From CLIP to DINO: https://paperswithcode.com/paper/from-clip-to-dino-visual-encoders-shout-in 

 

Papers with Code - From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Implemented in one code library.

paperswithcode.com

27. NEFTune: https://github.com/neelsjain/neftune 

 

GitHub - neelsjain/NEFTune: Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning - GitHub - neelsjain/NEFTune: Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

github.com


28. Cutie: https://github.com/hkchengrex/Cutie

 

GitHub - hkchengrex/Cutie: [arXiv 2023] Putting the Object Back Into Video Object Segmentation

[arXiv 2023] Putting the Object Back Into Video Object Segmentation - GitHub - hkchengrex/Cutie: [arXiv 2023] Putting the Object Back Into Video Object Segmentation

github.com


29. PALI: https://github.com/kyegomez/PALI3

 

GitHub - kyegomez/PALI3: Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER" - GitHub - kyegomez/PALI3: Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODEL...

github.com

30. Lion: https://github.com/lucidrains/lion-pytorch

 

GitHub - lucidrains/lion-pytorch: 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purported

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch - GitHub - lucidrains/lion-pytorch: 🦁 Lion, new optimizer discovered by...

github.com


31. Mustango: https://github.com/amaai-lab/mustango

 

GitHub - AMAAI-Lab/mustango: Mustango: Toward Controllable Text-to-Music Generation

Mustango: Toward Controllable Text-to-Music Generation - GitHub - AMAAI-Lab/mustango: Mustango: Toward Controllable Text-to-Music Generation

github.com


32. OneLLM: https://paperswithcode.com/paper/onellm-one-framework-to-align-all-modalities 

 

Papers with Code - OneLLM: One Framework to Align All Modalities with Language

Implemented in one code library.

paperswithcode.com


33. Alpha CLIP:https://github.com/sunzey/alphaclip

 

GitHub - SunzeY/AlphaCLIP: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want - GitHub - SunzeY/AlphaCLIP: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

github.com


34. Ferret(LMM): https://github.com/apple/ml-ferret

 

GitHub - apple/ml-ferret

Contribute to apple/ml-ferret development by creating an account on GitHub.

github.com


35. DreamGaussian4D: https://github.com/jiawei-ren/dreamgaussian4d

 

GitHub - jiawei-ren/dreamgaussian4d: [arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting

[arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting - GitHub - jiawei-ren/dreamgaussian4d: [arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting

github.com


36. Affusion: https://github.com/happylittlecat2333/Auffusion

 

GitHub - happylittlecat2333/Auffusion: Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation" - GitHub - happylittlecat2333/Auffusion: Offic...

github.com


37. InstantID: https://paperswithcode.com/paper/instantid-zero-shot-identity-preserving

 

Papers with Code - InstantID: Zero-shot Identity-Preserving Generation in Seconds

Implemented in one code library.

paperswithcode.com


38. https://paperswithcode.com/paper/scalable-diffusion-models-with-state-space

 

Papers with Code - Scalable Diffusion Models with State Space Backbone

Implemented in one code library.

paperswithcode.com


39. https://paperswithcode.com/paper/the-boundary-of-neural-network-trainability

 

Papers with Code - The boundary of neural network trainability is fractal

Implemented in one code library.

paperswithcode.com

40. RoSA: https://github.com/ist-daslab/rosa?tab=readme-ov-file

 

GitHub - IST-DASLab/RoSA

Contribute to IST-DASLab/RoSA development by creating an account on GitHub.

github.com


41. VMamba: https://paperswithcode.com/paper/vmamba-visual-state-space-model

 


42. Latte: https://github.com/Vchitect/Latte

 

GitHub - Vchitect/Latte: Latte: Latent Diffusion Transformer for Video Generation.

Latte: Latent Diffusion Transformer for Video Generation. - Vchitect/Latte

github.com


43. SuGaR: https://github.com/Anttwo/SuGaR

 

GitHub - Anttwo/SuGaR: Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Recons

Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering (CVPR 2024) - GitHub - Anttwo/SuGaR: Official PyTo...

github.com


44. TripoSR: https://paperswithcode.com/paper/triposr-fast-3d-object-reconstruction-from-a

 

Papers with Code - TripoSR: Fast 3D Object Reconstruction from a Single Image

Implemented in one code library.

paperswithcode.com

45. ViewDiff: https://github.com/facebookresearch/viewdiff?tab=readme-ov-file

 

GitHub - facebookresearch/ViewDiff: ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in a

ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings. (CVPR2024). - facebookresearch/ViewDiff

github.com


46. InstantStyle: https://github.com/instantstyle/instantstyle

 

GitHub - InstantStyle/InstantStyle: InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥 - InstantStyle/InstantStyle

github.com


47. XCube: https://arxiv.org/pdf/2312.03806

 

48. https://github.com/hp-l33/aim (AiM)


49. https://github.com/huage001/linfusion?tab=readme-ov-file

 

GitHub - Huage001/LinFusion: Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"

Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image" - Huage001/LinFusion

github.com

50. KSID: https://github.com/axning/ksid


51. MambaST: https://github.com/FilippoBotti/MambaST

 

52. Direct3D: https://arxiv.org/abs/2405.14832

 

Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

Generating high-quality 3D assets from text and images has long been challenging, primarily due to the absence of scalable 3D representations capable of capturing intricate geometry distributions. In this work, we introduce Direct3D, a native 3D generative

arxiv.org


53. Intrinsic Image Decomposition: https://github.com/compphoto/Intrinsic

 

GitHub - compphoto/Intrinsic: Repo for the paper "Intrinsic Image Decomposition via Ordinal Shading" (TOG 2023)

Repo for the paper "Intrinsic Image Decomposition via Ordinal Shading" (TOG 2023) - compphoto/Intrinsic

github.com

 

반응형