[추후 논문 리뷰 paper 정리]

2023.05.06

1. Segment Anything: https://ai.facebook.com/research/publications/segment-anything/

Segment Anything | Meta AI Research

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11

ai.facebook.com

2. InstructGPT: https://arxiv.org/abs/2203.02155

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not ali

arxiv.org

3. DDPM(Denoising Diffusion Probabilistic Models): https://arxiv.org/abs/2006.11239

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound

arxiv.org

4. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection: https://paperswithcode.com/paper/dino-detr-with-improved-denoising-anchor-1

Papers with Code - DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

🏆 SOTA for Object Detection on COCO 2017 val (box AP metric)

paperswithcode.com

5. LLaMA: Open and Efficient Foundation Language Models: https://paperswithcode.com/paper/llama-open-and-efficient-foundation-language-1

Papers with Code - LLaMA: Open and Efficient Foundation Language Models

🏆 SOTA for Question Answering on PIQA (Accuracy metric)

paperswithcode.com

2023.05.16

6. Hypernetworks: https://arxiv.org/abs/1609.09106

HyperNetworks

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a gen

arxiv.org

7. PET-Neus: https://paperswithcode.com/paper/pet-neus-positional-encoding-tri-planes-for

Papers with Code - PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces

Implemented in one code library.

paperswithcode.com

2023.05.17

8. LoRA: [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models (arxiv.org)

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes le

arxiv.org

9. SAM high quality: https://paperswithcode.com/paper/segment-anything-in-high-quality

Papers with Code - Segment Anything in High Quality

Implemented in one code library.

paperswithcode.com

10. QLoRA:https://arxiv.org/abs/2305.14314

QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quan

arxiv.org

11. LightGlue: https://paperswithcode.com/paper/lightglue-local-feature-matching-at-light

Papers with Code - LightGlue: Local Feature Matching at Light Speed

Implemented in 2 code libraries.

paperswithcode.com

12. DragGAN: https://github.com/XingangPan/DragGAN

GitHub - XingangPan/DragGAN: Official Code for DragGAN (SIGGRAPH 2023)

Official Code for DragGAN (SIGGRAPH 2023). Contribute to XingangPan/DragGAN development by creating an account on GitHub.

github.com

2023.08.05

13. SDXL: https://github.com/stability-ai/generative-models

GitHub - Stability-AI/generative-models: Generative Models by Stability AI

Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.

github.com

14. TAV: https://github.com/showlab/Tune-A-Video

GitHub - showlab/Tune-A-Video: [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation - GitHub - showlab/Tune-A-Video: [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models...

github.com

15. CoDeF: https://paperswithcode.com/paper/codef-content-deformation-fields-for

Papers with Code - CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Implemented in one code library.

paperswithcode.com

16. TF-ICON: https://paperswithcode.com/paper/tf-icon-diffusion-based-training-free-cross

Papers with Code - TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Implemented in one code library.

paperswithcode.com

17. Point-Bind & Point-LLM: https://arxiv.org/pdf/2309.00615.pdf

19. DEVA: https://paperswithcode.com/paper/tracking-anything-with-decoupled-video

Papers with Code - Tracking Anything with Decoupled Video Segmentation

🏆 SOTA for Unsupervised Video Object Segmentation on DAVIS 2016 val (G metric)

paperswithcode.com

20. Vote2Cap: https://github.com/ch3cook-fdu/vote2cap-detr

GitHub - ch3cook-fdu/Vote2Cap-DETR: Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023)

Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023) - GitHub - ch3cook-fdu/Vote2Cap-DETR: Code release for ''End-to-End 3D Dense Captioning wit...

github.com

21. InstaFlow: https://github.com/gnobitab/instaflow

GitHub - gnobitab/InstaFlow: :zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow

:zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow - GitHub - gnobitab/InstaFlow: :zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow

github.com

2023.10.02

22. DreamGaussian: https://paperswithcode.com/paper/dreamgaussian-generative-gaussian-splatting

Papers with Code - DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Implemented in one code library.

paperswithcode.com

23. Propainter: https://github.com/sczhou/propainter

GitHub - sczhou/ProPainter: [ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting - GitHub - sczhou/ProPainter: [ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

github.com

24. 3D gaussian splatting: https://arxiv.org/pdf/2308.04079.pdf

25. MetaClip: https://paperswithcode.com/paper/demystifying-clip-data

Papers with Code - Demystifying CLIP Data

Implemented in one code library.

paperswithcode.com

26. From CLIP to DINO: https://paperswithcode.com/paper/from-clip-to-dino-visual-encoders-shout-in

Papers with Code - From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Implemented in one code library.

paperswithcode.com

27. NEFTune: https://github.com/neelsjain/neftune

GitHub - neelsjain/NEFTune: Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning - GitHub - neelsjain/NEFTune: Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

github.com

28. Cutie: https://github.com/hkchengrex/Cutie

GitHub - hkchengrex/Cutie: [arXiv 2023] Putting the Object Back Into Video Object Segmentation

[arXiv 2023] Putting the Object Back Into Video Object Segmentation - GitHub - hkchengrex/Cutie: [arXiv 2023] Putting the Object Back Into Video Object Segmentation

github.com

29. PALI: https://github.com/kyegomez/PALI3

GitHub - kyegomez/PALI3: Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER" - GitHub - kyegomez/PALI3: Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODEL...

github.com

30. Lion: https://github.com/lucidrains/lion-pytorch

GitHub - lucidrains/lion-pytorch: 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purported

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch - GitHub - lucidrains/lion-pytorch: 🦁 Lion, new optimizer discovered by...

github.com

31. Mustango: https://github.com/amaai-lab/mustango

GitHub - AMAAI-Lab/mustango: Mustango: Toward Controllable Text-to-Music Generation

Mustango: Toward Controllable Text-to-Music Generation - GitHub - AMAAI-Lab/mustango: Mustango: Toward Controllable Text-to-Music Generation

github.com

32. OneLLM: https://paperswithcode.com/paper/onellm-one-framework-to-align-all-modalities

Papers with Code - OneLLM: One Framework to Align All Modalities with Language

Implemented in one code library.

paperswithcode.com

33. Alpha CLIP:https://github.com/sunzey/alphaclip

GitHub - SunzeY/AlphaCLIP: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want - GitHub - SunzeY/AlphaCLIP: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

github.com

34. Ferret(LMM): https://github.com/apple/ml-ferret

GitHub - apple/ml-ferret

Contribute to apple/ml-ferret development by creating an account on GitHub.

github.com

35. DreamGaussian4D: https://github.com/jiawei-ren/dreamgaussian4d

GitHub - jiawei-ren/dreamgaussian4d: [arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting

[arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting - GitHub - jiawei-ren/dreamgaussian4d: [arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting

github.com

36. Affusion: https://github.com/happylittlecat2333/Auffusion

GitHub - happylittlecat2333/Auffusion: Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation" - GitHub - happylittlecat2333/Auffusion: Offic...

github.com

37. InstantID: https://paperswithcode.com/paper/instantid-zero-shot-identity-preserving

Papers with Code - InstantID: Zero-shot Identity-Preserving Generation in Seconds

Implemented in one code library.

paperswithcode.com

38. https://paperswithcode.com/paper/scalable-diffusion-models-with-state-space

Papers with Code - Scalable Diffusion Models with State Space Backbone

Implemented in one code library.

paperswithcode.com

39. https://paperswithcode.com/paper/the-boundary-of-neural-network-trainability

Papers with Code - The boundary of neural network trainability is fractal

Implemented in one code library.

paperswithcode.com

40. RoSA: https://github.com/ist-daslab/rosa?tab=readme-ov-file

GitHub - IST-DASLab/RoSA

Contribute to IST-DASLab/RoSA development by creating an account on GitHub.

github.com

41. VMamba: https://paperswithcode.com/paper/vmamba-visual-state-space-model

42. Latte: https://github.com/Vchitect/Latte

GitHub - Vchitect/Latte: Latte: Latent Diffusion Transformer for Video Generation.

Latte: Latent Diffusion Transformer for Video Generation. - Vchitect/Latte

github.com

43. SuGaR: https://github.com/Anttwo/SuGaR

GitHub - Anttwo/SuGaR: Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Recons

Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering (CVPR 2024) - GitHub - Anttwo/SuGaR: Official PyTo...

github.com

44. TripoSR: https://paperswithcode.com/paper/triposr-fast-3d-object-reconstruction-from-a

Papers with Code - TripoSR: Fast 3D Object Reconstruction from a Single Image

Implemented in one code library.

paperswithcode.com

45. ViewDiff: https://github.com/facebookresearch/viewdiff?tab=readme-ov-file

GitHub - facebookresearch/ViewDiff: ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in a

ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings. (CVPR2024). - facebookresearch/ViewDiff

github.com

46. InstantStyle: https://github.com/instantstyle/instantstyle

GitHub - InstantStyle/InstantStyle: InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥 - InstantStyle/InstantStyle

github.com

47. XCube: https://arxiv.org/pdf/2312.03806

48. https://github.com/hp-l33/aim (AiM)

49. https://github.com/huage001/linfusion?tab=readme-ov-file

GitHub - Huage001/LinFusion: Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"

Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image" - Huage001/LinFusion

github.com

50. KSID: https://github.com/axning/ksid

51. MambaST: https://github.com/FilippoBotti/MambaST

52. Direct3D: https://arxiv.org/abs/2405.14832

Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

Generating high-quality 3D assets from text and images has long been challenging, primarily due to the absence of scalable 3D representations capable of capturing intricate geometry distributions. In this work, we introduce Direct3D, a native 3D generative

arxiv.org

53. Intrinsic Image Decomposition: https://github.com/compphoto/Intrinsic

GitHub - compphoto/Intrinsic: Repo for the paper "Intrinsic Image Decomposition via Ordinal Shading" (TOG 2023)

Repo for the paper "Intrinsic Image Decomposition via Ordinal Shading" (TOG 2023) - compphoto/Intrinsic

github.com

54. https://github.com/pixtella/anagram-mtl

GitHub - Pixtella/Anagram-MTL: [WACV 2025] Official implementation for the paper "Diffusion-based Visual Anagram as Multi-task L

[WACV 2025] Official implementation for the paper "Diffusion-based Visual Anagram as Multi-task Learning" - Pixtella/Anagram-MTL

github.com

55. Divot: https://paperswithcode.com/paper/divot-diffusion-powers-video-tokenizer-for

Papers with Code - Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Implemented in one code library.

paperswithcode.com

'Anything else' 카테고리의 다른 글

[LLM 리더보드 관련 기사] (1)	2023.10.12
Github 꾸미는 꿀팁 - shield.io (0)	2023.10.03
[DeepL pro 이용하기] - Rapid API 이용 (0)	2023.07.20
블로그 제작할 때 참고하는 사이트 (0)	2022.12.20
순수한 AI 개발자가 되고 싶은 사람 (0)	2022.12.02

kyujinpy

[추후 논문 리뷰 paper 정리] - 계속 업데이트

'Anything else' 카테고리의 다른 글

티스토리툴바

[추후 논문 리뷰 paper 정리] - 계속 업데이트

'Anything else' 카테고리의 다른 글

'Anything else' Related Articles

티스토리툴바