๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

AI/LLM project

[๐ŸŒธSakura-SOLAR] - SOLAR 10.7B ๋ชจ๋ธ์„ base๋กœ ํ•˜์—ฌ merge์™€ DPO ๋ฐฉ๋ฒ•๋ก ์„ ํ™œ์šฉํ•œ LLM

๋ฐ˜์‘ํ˜•

Sakura

Github: https://github.com/KyujinHan/Sakura-SOLAR-DPO

 

GitHub - KyujinHan/Sakura-SOLAR-DPO: Sakura-SOLAR-DPO: Merge, SFT, and DPO

Sakura-SOLAR-DPO: Merge, SFT, and DPO. Contribute to KyujinHan/Sakura-SOLAR-DPO development by creating an account on GitHub.

github.com

 

๐ŸŒธHuggingface: https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct

 

kyujinpy/Sakura-SOLAR-Instruct · Hugging Face

Sakura-SOLAR-Instruct (์ฃผ)๋ฏธ๋””์–ด๊ทธ๋ฃน์‚ฌ๋žŒ๊ณผ์ˆฒ๊ณผ (์ฃผ)๋งˆ์ปค์˜ LLM ์—ฐ๊ตฌ ์ปจ์†Œ์‹œ์—„์—์„œ ๊ฐœ๋ฐœ๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค Model Details Model Developers Kyujin Han (kyujinpy) Method Using Mergekit. I shared the information about my model. (training

huggingface.co

 


 

์•ˆ๋…•ํ•˜์„ธ์š”!

(์ฃผ)๋ฏธ๋””์–ด๊ทธ๋ฃน์‚ฌ๋žŒ๊ณผ์ˆฒ๊ณผ (์ฃผ)๋งˆ์ปค์—์„œ ์ง€์›๋ฐ›์•„ LLM ์—ฐ๊ตฌ๋ฅผ ํ•˜๊ณ  ์žˆ๋Š” kyujinpy์ž…๋‹ˆ๋‹ค!๐Ÿ˜„๐Ÿ˜„

 

์ตœ๊ทผ์— SOLAR-10.7B ๋ชจ๋ธ์ด Depth-Up-Scaling ๋ฐฉ๋ฒ•๋ก ์„ ํ™œ์šฉํ•˜์—ฌ Open LLM ๋ฆฌ๋”๋ณด๋“œ์—์„œ 1๋“ฑ์„ ๋‹ฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค..!

Depth-Up-Scaling์„ ํ†ตํ•ด์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜๋ฉด์„œ ๋ณด๋‹ค ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.๐Ÿ˜‰

 

๋˜ํ•œ, ์ตœ๊ทผ์— ์ƒˆ๋กœ์šด ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•์ธ DPO์™€ ๋ชจ๋ธ๋“ค์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ์„œ๋กœ ๊ณต์œ ํ•˜๋Š” merge(slerp) ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋งŽ์ด ์ด๋Œ์–ด ๋‚ด๋Š” ๊ฒƒ์„ ๋ณด๋ฉด์„œ, SOLAR ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๋ชจ๋ธ ํŠœ๋‹์— ๋„์ „ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค..!

 

## Merge

Merge(slerp)๋ฐฉ๋ฒ•์€ mergekit์ด๋ผ๋Š” github์—์„œ ์‰ฝ๊ฒŒ ๊ตฌํ˜„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค!

SLERP์€ Spherical Linear Interpolation์˜ ์•ฝ์ž๋กœ, ๊ตฌ๋ฉด ์„ ํ˜• ๋ณด๊ฐ„๋ฒ•์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค!

๋‹จ์ˆœํ•œ euclidean ํ‰๋ฉด์—์„œ ๊ตฌํ•˜๋Š” ๊ฑฐ๋ฆฌ๊ณ„์‚ฐ๊ณผ๋Š” ๋‹ค๋ฅธ๋ฐ ๊ฒ€์ƒ‰ํ•˜์‹œ๋ฉด ๋งŽ์€ ์ •๋ณด๋“ค์ด ๋‚˜์™€์„œ ์‰ฝ๊ฒŒ ์ดํ•ดํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

 

์ €๋Š” slerp method๋ฅผ ์•„๋ž˜์˜ 2๊ฐœ์˜ ๋ชจ๋ธ์— ์ ์šฉํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!๐Ÿ˜‹

- SOLAR-10.7b-Instruct-v1.0

- SauerkrautLM-SOLAR-Instruct

์ด๋ ‡๊ฒŒ ํ•ด์„œ ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ์ด ๐ŸŒธSakura-SOLAR-Instruct๐ŸŒธ ์ž…๋‹ˆ๋‹ค!

 

## DPO

DPO ๋ฐฉ๋ฒ•์€ directly preference optmization์˜ ์•ฝ์ž์ž…๋‹ˆ๋‹ค!

ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ์€, ๋‘ ๋ชจ๋ธ ๊ฐ„์˜ ๋Œ€๋‹ต ๋ถ„ํฌ๋„๋ฅผ ๋น„๊ตํ•˜์—ฌ์„œ ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ๋Œ€๋‹ต์œผ๋กœ ๋ถ„ํฌ๋ฅผ ๋งž์ถฐ์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹œ๋ฉด ํŽธํ•ฉ๋‹ˆ๋‹ค! (์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค! ์ˆ˜์‹์— ์ต์ˆ™ํ•˜์‹  ๋ถ„๋“ค์ด๋ผ๋ฉด KL-divergence๋ฅผ ๋– ์˜ฌ๋ฆฌ์…”๋„ ์ข‹์Šต๋‹ˆ๋‹ค ใ…Žใ…Ž)

 

์ตœ๊ทผ์— ๋ชจ๋ธ์„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ open source models์„ ๋ณด๋ฉด์„œ, ์ €์™€ ๊ฐ™์ด ์ž์›์ด ๋ถ€์กฑํ•œ ์‚ฌ๋žŒ๋“ค์ด ํ•™์Šตํ•˜๊ธฐ ์ ํ•ฉํ•œ dpo ๋ฐ์ดํ„ฐ์…‹์„ ์ œ ๋‚˜๋ฆ„๋Œ€๋กœ ์ถ”๋ ค๋ดค์Šต๋‹ˆ๋‹ค! ๐Ÿ˜‹

- Intel/orca_dpo_pairs

- argilla/distilabel-math-preference-dpo

- unalignment/toxic-dpo-v0.1

์ €๋Š” ์—ฌ๊ธฐ์„œ orca_dpo์™€ math_dpo๋ฅผ ์ด์šฉํ–ˆ์Šต๋‹ˆ๋‹ค!

toxic_dpo๋„ ์ด์šฉํ•˜๊ณ  ์‹ถ์—ˆ์ง€๋งŒ, ์ด๊ฒƒ์€ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์˜ ์‹œ๋„๋กœ ๋‚จ๊ฒจ๋‘๊ฒ ์Šต๋‹ˆ๋‹ค..๐Ÿ™ƒ๐Ÿ™ƒ

 

์—ฌ๋Ÿฌ DPO ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•ด์„œ ๋งŒ๋“  ๋ชจ๋ธ์˜ ๋ฆฌ์ŠคํŠธ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค! ๐Ÿ˜‹

- Math_dpo

-> ๐ŸŒธ kyujinpy/Sakura-SOLAR-Instruct-DPO-v1

-> ๐ŸŒธ kyujinpy/Sakura-SOLAR-Instruct-DPO-v2

 

- Orca_dpo

-> ๐ŸŒธ kyujinpy/Sakura-SOLRCA-Instruct-DPO

 

- Orca_dpo + Math_dpo

-> ๐ŸŒธ kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1

-> ๐ŸŒธ kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2

 

๋ชจ๋ธ์„ ๋งŒ๋“ค ๋•Œ, ๊ฐ€์žฅ ๊ณ ๋ฏผํ–ˆ๋˜ ์ ์€ hyperparameters ์˜€์Šต๋‹ˆ๋‹ค..!

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด, ๋ฐค์ƒˆ๋„๋ก ๋ชจ๋ธ์˜ open source์— ๋‚˜์™€์žˆ๋Š” ์ •๋ณด๋“ค๊ณผ ๊ณผ๊ฑฐ log ๊ธฐ๋ก๋“ค์„ ์‚ดํŽด๋ณด์•˜๊ณ , ์ œ๊ฐ€ ์‹œ๋„ํ–ˆ๋˜ ๊ฒฝํ—˜๋“ค๊นŒ์ง€ ๋ชจ๋‘ ์ข…ํ•ฉํ•˜์—ฌ ์–ด๋Š์ •๋„ ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค ๐Ÿ˜‡๐Ÿ˜‡

 

์ด๊ฒƒ์„ ๋ฐ”ํƒ•์œผ๋กœ, ์—ฌ๋Ÿฌ ํ•˜์ดํผ ํŒŒ๋ผ๋งˆํ„ฐ๋“ค์„ ์‹คํ—˜ ์ •์‹ ์œผ๋กœ ๊ฐ๊ฐ์˜ ๋ชจ๋ธ์— ์ ์šฉ์„ ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!๐Ÿ˜

์ž์„ธํ•œ ํ•˜์ดํผ ํŒŒ๋ฆฌ๋ฏธํ„ฐ๋Š” github์— ๋ชจ๋‘ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค! ์—ฌ๋Ÿฌ๋ถ„๋“ค์—๊ฒŒ ํ•˜๋‚˜์˜ ์ธ์‚ฌ์ดํŠธ๊ฐ€ ๋˜์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค๐Ÿฅฐ๐Ÿฅฐ

 

## ๋งˆ๋ฌด๋ฆฌ

KO-LLM ๋ฆฌ๋”๋ณด๋“œ์— ์ด์–ด์„œ EN-LLM ๋ฆฌ๋”๋ณด๋“œ๋„ 1๋“ฑ์„ ๋‹ฌ์„ฑํ•ด๋ณด๋Š” ๊ฒฝํ—˜์„ ํ–ˆ๋‹ค๋‹ˆ..์ •๋ง ๊ฐ์‚ฌํ•  ๋”ฐ๋ฆ„์ž…๋‹ˆ๋‹ค. (๋ฌผ๋ก  ๊ณ ์ถฉ๋„ ๋งŽ์•˜์ง€๋งŒ์š”..! ใ…Žใ…Ž)

์ €์™€ ๊ฐ™์ด ์—ฐ๊ตฌํ•˜๊ณ  ์‘์›ํ•˜๋Š” ๋ชจ๋“  ๋ถ„๋“ค๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฌ๊ณ , ๋˜ ์•„๋‚Œ์—†์ด ์ง€์›ํ•ด์ฃผ์‹œ๋Š” (์ฃผ)๋ฏธ๋””์–ด๊ทธ๋ฃน์‚ฌ๋žŒ๊ณผ์ˆฒ๊ณผ (์ฃผ)๋งˆ์ปค์—๋„ ๊ฐ์‚ฌํ•จ์„ ํ‘œํ•ฉ๋‹ˆ๋‹ค๐Ÿค—๐Ÿค—

 

๋‹ค์Œ์—๋„ ๋” ์ข‹์€ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ๋กœ ์ฐพ์•„๋ต ์ˆ˜ ์žˆ๋„๋ก ๋…ธ๋ ฅํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค๐Ÿคฉ๐Ÿคฉ

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

 

๋ฐ˜์‘ํ˜•