๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

AI/Paper - Theory

[NeRF-CAM ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] - COORDINATE-AWARE MODULATION FOR NEURAL FIELDS

๋ฐ˜์‘ํ˜•

๐Ÿ’ฐ์ƒˆํ•ด๋ณต ๋งŽ์ด ๋ฐ›์œผ์„ธ์š”!!๐Ÿ’ฐ

 

*NeRF-CAM๋ฅผ ์œ„ํ•œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ๊ธ€์ž…๋‹ˆ๋‹ค! ๊ถ๊ธˆํ•˜์‹  ์ ์€ ๋Œ“๊ธ€๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”!

 

NeRF-CAM paper: arxiv.org/pdf/2311.14993.pdf

NeRF-CAM  github: Coordinate-Aware Modulation for Neural Fields (maincold2.github.io)

 

Coordinate-Aware Modulation for Neural Fields

Neural fields, mapping low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved

maincold2.github.io


Contents

1. Simple Introduction

2. Background Knowledge: NeRF

3. Method

- CAM

- IMAGE, NERF, VIDEO with CAM

4. Result


Simple Introduction

CAM

NeRF๋Š” ํ˜„์žฌ 3D vision ๋ถ„์•ผ์—์„œ ๊ต‰์žฅํžˆ ํ•ซํ•œ ์ฃผ์ œ์ด๋‹ค!

์ตœ๊ทผ INR (Implicit Neural Representation)์— ๊ด€์‹ฌ์ด ์ƒ๊ฒจ์„œ ์ฐพ์•„๋‹ค๋ณด๋‹ค๊ฐ€, ๋‚ด๊ฐ€ ์ข‹์•„ํ•˜๋Š”(?) NeRF์— INR์„ ์ ‘๋ชฉ์‹œํ‚จ ๋…ผ๋ฌธ์„ ์†Œ๊ฐœ ๋ฐ›์•„์„œ ํ•œ๋ฒˆ ์ฝ์–ด๋ณด์•˜๋‹ค.

 

์‚ฌ์‹ค ๋…ผ๋ฌธ ์ฝ๊ธฐ์ „์—, ๋…ผ๋ฌธ์—์„œ CAM์ด๋ผ๊ณ  ์†Œ๊ฐœํ•˜๊ธธ๋ž˜, XAI์— ๊ด€ํ•œ ๋…ผ๋ฌธ์ธ์ค„ ์•Œ๊ณ ,

์„ค๋งˆ NeRF๊ฐ€ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ XAI๋กœ ์„ค๋ช…ํ–ˆ๋‹ค๊ณ !? ๋ผ๋Š” ์ƒ๊ฐ๋„ ํ•˜์˜€๋‹ค..ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹

 

์•„๋ฌดํŠผ! CAM ๋ฐฉ๋ฒ•๋ก ์€ NeRF๊ฐ€ 1D signal๋ถ€ํ„ฐ 3D representation๊นŒ์ง€ ์—ฌ๋Ÿฌ task๋ฅผ ํ•˜๋‚˜์˜ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋ชจ๋‘ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๊ณ , ์„ฑ๋Šฅ์ ์ธ ๋ถ€๋ถ„๋„ ํ–ฅ์ƒ๋œ๋‹ค๊ณ  ํ•œ๋‹ค.

 

ํ•œ๋ฒˆ ์‚ดํŽด๋ณด์ž!


Background Knowledge: NeRF

NeRF ๋…ผ๋ฌธ๋ฆฌ๋ทฐ: https://kyujinpy.tistory.com/16

 

[NeRF ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] - NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

* ์ด ๊ธ€์€ NeRF์— ๋Œ€ํ•œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ์ด๊ณ , ํ•ต์‹ฌ๋งŒ ๋‹ด์•„์„œ ๋‚˜์ค‘์— NeRF Code๋ฅผ ์ดํ•ดํ•  ๋•Œ ์‰ฝ๊ฒŒ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋„๋ก ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. * ์ฝ”๋“œ์™€ ํ•จ๊ป˜ ๋ณด์‹œ๋ฉด ๋งค์šฐ ๋งค์šฐ ๋„์›€์ด ๋  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐ์ด ๋“ค๊ณ , ์ฝ”๋“œ

kyujinpy.tistory.com

 

*NeRF์™€ ๊ด€๋ จ๋œ ์ง€์‹๋“ค์€ ์ „๋ถ€ ์Šคํ‚ต๋œ ์ฑ„ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค!

*๋…ผ๋ฌธ์˜ ์ €์ž๋Š” CAM์ด๋ผ๊ณ  ์†Œ๊ฐœํ•˜์ง€๋งŒ, ์ €๋Š” XAI์—์„œ ์ด๋ฏธ ์œ ๋ช…ํ•œ CAM ๋ชจ๋ธ๊ณผ ํ—ท๊ฐˆ๋ฆฌ๊ธฐ(?) ๋•Œ๋ฌธ์—.. NeRF-CAM์œผ๋กœ ๋ช…์นญํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค..ใ…Žใ…Ž


Method

Abstract

Abstract

์ด ๋…ผ๋ฌธ์˜ abstract๋ฅผ ์ฝ์–ด๋ณด๋‹ˆ ์ข€ ์žฌ๋ฏธ์žˆ๋Š”(?) ๊ฒƒ ๊ฐ™์•„์„œ ๊ฐ€์ ธ์™”๋‹ค.

๋…ผ๋ฌธ์˜ ์ค‘์š”ํ•œ ํ•ต์‹ฌ์€ 

1. ๊ธฐ์กด์˜ MLP์™€ grid representation ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์€ spectral bias๋กœ ์ธํ•ด ์„ฑ๋Šฅ์ด ๋‚ฎ์•„์ง€๊ณ , ๋‚ฎ์€ ์ˆ˜๋ ด ์†๋„๋ฅผ ๊ฐ€์ง„๋‹ค.

2. CAM์€ MLP์™€ grid representation์„ neural field ๋‚ด์—์„œ ์‚ฌ์šฉํ•  ๋•Œ spectral free-bias๋ฅผ feature ํ˜•ํƒœ๋กœ ์ฃผ์ž…ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค.

3. ์ด๊ฒƒ์€ dynamic scene๊ณผ static scene์— ๋Œ€ํ•ด SOTA๋ฅผ ๊ธฐ๋กํ–ˆ๊ณ , video compression์— ๋Œ€ํ•ด์„œ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

 

์ œ๊ฐ€ ๋Š๋‚€ ์ด ๋…ผ๋ฌธ์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ๋ด์•ผํ•  ๋ถ€๋ถ„์€ 2๊ฐ€์ง€๋ผ๊ณ  ๋Š๊ผˆ๋Š”๋ฐ,

MLP์™€ grid representation์˜ ์„ฑ๋Šฅ์ด ๋‚˜๋น ์ง€๊ฒŒ ํ•˜๋Š” spectral bias๊ฐ€ ๋ฌด์—‡์ธ๊ฐ€!!!

๊ทธ๋ฆฌ๊ณ  spectral bias์— ๋Œ€ํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ CAM์˜ ๊ตฌ์กฐ๋Š” ๋ฌด์—‡์ธ๊ฐ€!?

 

์ด๋ ‡๊ฒŒ ๋‘๊ฐ€์ง€ ๊ด€์ ์—์„œ ํ•œ๋ฒˆ ํ’€์–ด๋‚˜๊ฐ€๋ณด๊ฒ ๋‹ค!


CAM (Coordinate-Aware Modulation)

CAM

CAM์€ ์ด 3๊ฐ€์ง€ ํŒŒํŠธ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

1. Coordinate-aware modulation

2. Coordinate priority for CAM

3. Feature normalization

๊ฐ๊ฐ์„ ํ•œ๋ฒˆ ์ˆœ์ฐจ์ ์œผ๋กœ ์‚ดํŽด๋ณด๊ฒ ๋‹ค!


1. Coordinate-aware modulation

Coordinate-aware modulation equation (when applying 1-dimension)

๋จผ์ € ์‹์„ ์ •์˜ ํ•˜๊ธฐ ์ „์—, ๊ธฐํ˜ธ๋ฅผ ์ •์˜ ํ•ด๋ณด์ž.

1.  n,c๋Š” ๊ฐ๊ฐ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ์˜ N๊ณผ ์ฑ„๋„ ์‚ฌ์ด์ฆˆ C์˜ index ๋ฒˆํ˜ธ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

2. X๋Š” ์ขŒํ‘œ๊ฐ’์ด๋‹ค. (๋ฒกํ„ฐ์˜ shape์€ NxD; D๋Š” coordinate dimension)

3. F์™€ F~ ๊ฐ’์€ ๊ฐ๊ฐ intermediate feature tensor์™€ modulated output feature๋ฅผ ์˜๋ฏธํ•œ๋‹ค. (F~๋Š” ์œ„์˜ ์ขŒ๋ณ€์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.)

4-1. r()๊ณผ b() ํ•จ์ˆ˜๋Š” ๊ฐ๊ฐ scale๊ณผ shift factor๋ฅผ ์˜๋ฏธํ•œ๋‹ค. (gamma์™€ beta function)

4-2. ๋˜ํ•œ ๊ฐ ํ•จ์ˆ˜๋Š” NxD ๋ฒกํ„ฐ๋ฅผ N ๋ฒกํ„ฐ๋กœ projection ํ•œ๋‹ค.

5. T์™€ B๋Š” scalar value๋กœ, ๊ฐ๊ฐ single-channel grids์™€ ์ฃผ์–ด์ง„ ๊ฐ ์ขŒํ‘œ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.


ํ•ด๋‹น ์ˆ˜์‹์„ ํ’€๋ฉด ๋‹ค์Œ์ฒ˜๋Ÿผ ํ•ด์„ ๊ฐ€๋Šฅํ•˜๋‹ค.

0. MLP์™€ Grid ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์„ parallelํ•˜๊ฒŒ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ coordinate-aware modulation์„ ์ •์˜ํ•œ๋‹ค.

1. ์—ฌ๊ธฐ์„œ Fn,c๋Š” MLP์—์„œ ๋‚˜์˜ค๋Š” tensor vector๋ฅผ ์˜๋ฏธํ•œ๋‹ค. (๋งค์šฐ ์ค‘์š”!)

2. ๋˜ํ•œ ๊ฐ๊ฐ์˜ r(), b()์—์„œ ๋‚˜์˜ค๋Š” scalar value๋Š” grid๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์–ป์€ ๊ฒƒ์ด๋‹ค! (๋งค์šฐ ์ค‘์š”!)

3-1. ๋ชจ๋ธ์— ์ž…๋ ฅ๊ฐ’์„ ๋“ค์–ด๊ฐ„ Input coordinate์„ ๋ฐ”ํƒ•์œผ๋กœ MLP์—์„œ ๋„ฃ์–ด์„œ intermedidate feautre๋ฅผ ์ถ”์ถœํ•œ๋‹ค.

3-2. ๋˜ํ•œ, ์ž…๋ ฅ๋œ ์ขŒํ‘œ๊ฐ’์„ grid sampling์„ ํ†ตํ•˜์—ฌ ๊ฐ๊ฐ์˜ scale๊ณผ shift factor๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

4. ์ด๋ ‡๊ฒŒ ์–ป์–ด์ง„, Fn,c์™€ scale, shift factor๋ฅผ equation (1)์— ๋”ฐ๋ผ์„œ ๊ณ„์‚ฐํ•˜์—ฌ modulated output feature๋ฅผ ์™„์„ฑํ•œ๋‹ค!!

 

๋…ผ๋ฌธ์˜ ์ €์ž๋“ค์€, ํ•ด๋‹น ์ˆ˜์‹์ด ๊ณ ์ฐจ์›์˜ coordinate์— ๋Œ€ํ•ด์„œ๋„ compactness๋ฅผ ๋ณด์กดํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์–ธ๊ธ‰ํ•œ๋‹ค.


Grid function

- Appendix์— ์žˆ๋Š” grid function์„ ์ฝ์–ด๋ณด๋‹ˆ, ํ›ˆ๋ จ๋˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

- ๊ทธ๋ž˜๋„ Grid function์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ์ž˜ ์•ˆ๋˜๋‹ˆ, ์ฝ”๋“œ๋ฅผ ๋œฏ์–ด๋ด์•ผ๊ฒ ๋‹ค. ์–ด๋–ป๊ฒŒ grid๋ฅผ ๋งŒ๋“œ๋Š”์ง€ ์•Œ์•„์•ผ๊ฒ ๋‹ค ใ…‹ใ…‹ใ…‹ใ…‹

 

Grid code
affine ์ดˆ๊นƒ๊ฐ’

- ์œ„์˜ ์ฝ”๋“œ์—์„œ, ๋”ฑ ๋ณด๋‹ˆ, grid_sample์„ ํ†ตํ•ด์„œ gamma์™€ beta์— ๋Œ€ํ•œ ํ•จ์ˆ˜๊ฐ’์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

- ์ผ๋‹จ ์ž…๋ ฅ๊ฐ’์œผ๋กœ, nn.Parameter์™€ coordination ๊ฐ’์ด ์žˆ๋‹ค๋Š” ๊ฑธ ๊ธฐ์–ตํ•ด๋ณด์ž.

 

grid_sample

- grid_sample ํ•จ์ˆ˜๋Š” ์œ„์™€ ๊ฐ™์ด ์ •์˜๋˜์–ด ์žˆ๋‹ค.

- ์ฝ”๋“œ๋ฅผ ๋ณด๋‹ˆ, sampling ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ๊ฒƒ ๊ฐ™์€๋ฐ, bilinear interpolation ๋ณด๋‹ค ์ข€ ๋” ํšจ๊ณผ์ ์œผ๋กœ samplingํ•  ์ˆ˜ ์žˆ๋Š” function์ธ ๊ฒƒ ๊ฐ™๋‹ค.

1. nn.Parameter๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์ž„์˜์ ์œผ๋กœ sampling์„ ์ง„ํ–‰ํ•œ๋‹ค.

2. ์ž…๋ ฅ๋œ coordination์— ํ•ด๋‹นํ•˜๋Š” ์œ„์น˜๋ฅผ ์ฐพ์•„์„œ, ๊ทธ ์œ„์น˜์— ํ•ด๋‹นํ•˜๋Š” nn.Parameter๋กœ๋ถ€ํ„ฐ ๋งŒ๋“ค์–ด์ง„ vector space์—์„œ ๊ฐ€์ ธ์˜จ๋‹ค.

3. ๊ทธ๋ฆฌ๊ณ , view(2,-1,1)์„ ํ†ตํ•˜์—ฌ์„œ, shape์„ ๋งž์ถ”์–ด์„œ, 0๋ฒˆ์งธ๋ฅผ gamma function์œผ๋กœ ์ •์˜ํ•˜๊ณ  1๋ฒˆ์งธ๋Š” beta function์œผ๋กœ ์ •์˜ํ–ˆ๋‹ค.

-> ์ดˆ๊นƒ๊ฐ’์œผ๋กœ, scale์€ ์ „๋ถ€ 1, shift๋Š” ์ „๋ถ€ 0์œผ๋กœ ์„ค์ •ํ•œ๋‹ค.

+) ๊ธฐํ•˜์ ์˜๋ฏธ๋ฅผ ๊ฐ€์ ธ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— grid ์ขŒํ‘œ๋ฅผ [-1,1]๋กœ ์„ค์ •ํ•˜์˜€๋‹ค.

# ์ขŒํ‘œ (-1, 1)๋กœ normalization ํ•˜๋Š” ๋ฐฉ๋ฒ•

# Input image์˜ width size์™€ height๋ฅผ ๊ณ ๋ คํ•˜์—ฌ์„œ (0, 1)๋กœ resize
x /= IMAGE_WIDTH - 1 
y /= IMAGE_HEIGHT - 1

# ์•„๋ž˜์˜ ๊ณผ์ •์„ ํ†ตํ•ด์„œ, (0,1) => (-1,1)
x = (x - 0.5) * 2
y = (y - 0.5) * 2

2. Coordinate priority for CAM

CAM coordinate์— ๋Œ€ํ•œ ์šฐ์„ ์ˆœ์œ„

ํ•ด๋‹น ๋ถ€๋ถ„์€ input coordination์ด ๋งŒ์•ฝ ๋†’์€ ์ฐจ์›์ด๋ผ๋ฉด, ์–ด๋–ค ๊ฒƒ์„ ์„ ํƒํ•˜์—ฌ์„œ grid๋ฅผ ํ†ตํ•ด scalar value๋ฅผ ๋ฝ‘์•„๋‚ด์•ผ ํ•˜๋Š”๊ฐ€์— ๋Œ€ํ•œ ์ €์ž์˜ ์„ค๋ช…์ด๋‹ค.

 

ํ•ต์‹ฌ๋ถ€๋ถ„์€, NeRF๋ฅผ ํ•™์Šตํ•  ๋•Œ๋Š” view-direction์— ๋Œ€ํ•œ coordination์„ ์ถ”๊ฐ€ํ•˜๊ณ , D-NeRF์™€ video representation์€ ์‹œ๊ฐ„์ถ•์— ๋Œ€ํ•ด์„œ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋งค๊ธด๋‹ค.

 

์šฐ์„ ์ˆœ์œ„๊ฐ€ ๋งค๊ฒจ์ง„ coordination์„ ํ™œ์šฉํ•˜์—ฌ์„œ grid๋ฅผ ํ†ตํ•ด scale ๋ฐ shift ๊ฐ’์„ ์ถ”์ถœํ•˜๊ฒŒ ๋œ๋‹ค.


3. Feature normalization

feature normalization

ํ•ด๋‹น ๋ถ€๋ถ„์€ ๋‚˜๋„ ๋ณด๊ณ  ์กฐ๊ธˆ(?) ๋†€๋ž๋‹ค.

์ผ๋‹จ ๋…ผ๋ฌธ์˜ ์ €์ž๊ฐ€ ์„ค๋ช…ํ•˜๋Š” ๊ฐ€์„ค์€, CAM์—์„œ intermediate feature๋ฅผ ๊ด€์ฐฐํ•˜์˜€์„ ๋•Œ, ๋‹ค์–‘ํ•œ ๋ถ„์‚ฐ์„ ๊ฐ€์ง„ ํŠน์ง• ๋ฒกํ„ฐ๋“ค์ด ํ•™์Šตํ•  ๋•Œ regularization์ด ์•ˆ๋˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ(?) ํ–‰๋™ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

 

๋”ฐ๋ผ์„œ, ๊ฐ intermediate feature๋ฅผ ๊ทธ๋“ค์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ํ™œ์šฉํ•ด normalization์„ ํ•ด์ฃผ๊ณ  ๋„ฃ์–ด์ฃผ๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜์˜€๊ณ , ์ด๋žฌ์„ ๋•Œ CAM์ด ๋” ์•ˆ์ •ํ™”๋˜๊ณ  ์ˆ˜๋ ด์†๋„๊ฐ€ ๋นจ๋ผ์กŒ๋‹ค๊ณ  ์„œ์ˆ ํ•˜์˜€๋‹ค.

 

๋ชจ๋ธ ์ž…๋ ฅ๊ฐ’ ์ค‘๊ฐ„์— normalization์„ ์ ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์€ ์•ฝ๊ฐ„ ์ƒˆ๋กญ๋‹ค(?)


normalization

- ์ œ๊ฐ€ ์ž˜๋ชป ์ดํ•ดํ•œ ์ค„ ์•Œ๊ณ , ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ดค๋Š”๋ฐ ์ง„์งœ์˜€๋‹ค?! ์™€์šฐ ใ…Žใ…Ž


IMAGE, NERF, VIDEO with CAM

IMAGE

IMAGE

์•ž์—์„œ ๋‚˜์˜จ, CAM ๋ฐฉ๋ฒ•๋ก ๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•œ ์–˜๊ธฐ์ด๋‹ค.

๋‹ค๋งŒ IMAGE์ด๊ธฐ ๋•Œ๋ฌธ์—, 2-dimensional coordinates๊ฐ€ ์ด์šฉ๋œ๋‹ค๋Š” ์ ์ด ์ฐจ๋ณ„๋œ๋‹ค!

 

CAM with image

๋˜ํ•œ feature normalization์—์„œ ์„ค๋ช…ํ•œ ๊ฒƒ ์ฒ˜๋Ÿผ, intermediate feature F๊ฐ€ ๊ฐ layer๋งˆ๋‹ค normalization๋˜์–ด์„œ ๋“ค์–ด๊ฐ„๋‹ค.

๋˜‘๊ฐ™์ด grid representation์„ ํ†ตํ•ด์„œ scale๊ณผ shift factor๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.

 

์ถœ์ฒ˜: pytorch ๊ธฐ๋ณธ ๋ฌธ๋ฒ• ๋ฐ ์ฝ”๋“œ, ํŒ snippets - gaussian37

์ถ”๊ฐ€๋กœ, bilinearly interpolation์„ ์–ด๋–ป๊ฒŒ ์„ค๋ช…ํ•˜๋ฉด ์ข‹์„๊นŒํ•˜๋‹ค๊ฐ€, ์•„์ฃผ ์ข‹์€ ์ž๋ฃŒ๋ฅผ ์ฐพ์•˜๋‹ค.

์ด ์ž๋ฃŒ๋Š”, ์ œ๊ฐ€ ์œ„ํ•ด์„œ ์–ธ๊ธ‰ํ•œ grid_sample์— ๋Œ€ํ•ด์„œ ์•„์ฃผ ์ž์„ธํ•˜๊ฒŒ ๋‹ค๋ฃจ๊ณ  ์žˆ์–ด์„œ, ํ•œ๋ฒˆ ์ฝ์–ด๋ณด๋ฉด ์ดํ•ด์— ๋„์›€์ด ๋  ๊ฒƒ ๊ฐ™๋‹ค!

 

๋…ผ๋ฌธ์—์„œ ์–ธ๊ธ‰ํ•œ ๋‚ด์šฉ์„ ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•˜๋ฉด, (์•„๊นŒ ์–ธ๊ธ‰ํ•œ grid_sample ์ฝ”๋“œ์— ๋‚˜์˜จ ๋ณ€์ˆ˜๋ช…์„ ๊ธฐ์ค€์œผ๋กœ) grid์˜ ์ขŒํ‘œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ input์— ๋งค์นญ๋  ๋•Œ, ์†Œ์ˆ˜์ ์ด ์ƒ๊ธฐ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋Š”๋ฐ ์ด ์†Œ์ˆ˜์ ์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’์„ bilinear interpolation์„ ์ด์šฉํ–ˆ๋‹ค๋Š” ๊ฒƒ ๊ฐ™๋‹ค.


NERF (Novel View Synthesis)

NeRF

NeRF๋„ ๋˜‘๊ฐ™๋‹ค!

๋‹ค๋งŒ, ์ž…๋ ฅ๊ฐ’๊ณผ grid representation์— ์ด์šฉ๋˜๋Š” X์˜ ๊ฐ’์ด ์‚ด์ง ๋‹ค๋ฅด๋‹ค๋Š” ์ ๋งŒ ์œ ์˜ํ•˜๋ฉด ๋œ๋‹ค.

MLP์— ๋“ค์–ด๊ฐˆ ๋•Œ๋Š”, ๊ธฐ์กด NeRF์ฒ˜๋Ÿผ 3์ฐจ์› ์ขŒํ‘œ๋กœ ๋ถ€ํ„ฐ sampling ๋œ S points์™€ camera parameters๋“ค์„ ํ™œ์šฉํ•˜์—ฌ์„œ, 5D feature๋ฅผ input์œผ๋กœ ํ•˜๋ฉด ๋œ๋‹ค.

๋‹ค๋งŒ, grid representation์—์„œ๋Š” camera parameter๋งŒ ์ด์šฉํ•ด์„œ Nx2์˜ shape์„ ๊ตฌ์„ฑํ•œ๋‹ค๋Š” ์ ! ์œ ์˜ํ•ด์•ผํ•œ๋‹ค.

 

CAM with NeRF
CAM with D-NeRF

๋‹ค๋งŒ, D-NeRF์™€ ๊ฐ™์€ dynamic scene์—์„œ๋Š” ์‹œ๊ฐ„์ถ• t๊ฐ€ ํ•ต์‹ฌ์ด๊ธฐ ๋•Œ๋ฌธ์—, grid representation์— ํ™œ์šฉ๋˜๋Š” X๊ฐ€ 1์ฐจ์›์ด๋‹ค.


VIDEO

VIDEO๋„ D-NeRF์™€ ์œ ์‚ฌํ•˜๋‹ค.

CAM์—์„œ๋Š” NeRV ๋…ผ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์‹œ๊ฐ„์ถ• t๋ฅผ ํ™œ์šฉํ•˜์—ฌ์„œ grid representation์„ ์ˆ˜ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

 

CAM with video

๊ฑฐ์˜ ๋‹ค ๋™์ผํ•˜์ง€๋งŒ, ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•  ๋•Œ, NxC๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ฐ๊ฐ์˜ ์ฑ„๋„๋งˆ๋‹ค ๊ณ„์‚ฐํ•œ ๊ฐ’์œผ๋กœ normalizationํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.


Result

- 1D signal์— ๋Œ€ํ•œ ์„ฑ๋Šฅ๋„ ์ข‹๋‹ค!

 

- Image generalization, novel view synthesis ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ video representation์—์„œ๋„ ์ƒ๋‹นํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

 

dynamic scene

- Dynamic scene์— ๋Œ€ํ•œ ์„ฑ๋Šฅ๋„ PSNR ๊ฐ’์ด ๊ฐ€์žฅ ๋†’๋‹ค..! (Parameter๋„ ์ ์–ด์„œ ํšจ์œจ์ ์ด๋‹ค)


Ablation study

- ํ™•์‹คํžˆ CAM์—์„œ ์ œ์•ˆํ•œ grid-representation๊ณผ MLP๋ฅผ ํ•จ๊ป˜ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ์œ ์˜๋ฏธํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.


- 2024.02.09 Kyujinpy ์ž‘์„ฑ.

(๊ฐ„๋งŒ์— ์•ฝ๊ฐ„(?) ์–ด๋ ค์šด ๋…ผ๋ฌธ์„ ์ฝ๋Š๋ผ 2~3์‹œ๊ฐ„์„ ์“ด ๊ฒƒ ๊ฐ™๋‹ค! ๊ถ๊ธˆํ•˜์‹  ์ ์€ ๋Œ“๊ธ€๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”)

๋ฐ˜์‘ํ˜•