Chapter 2์—์„œ๋Š” 1D linear regression ํ™œ์šฉํ•œ supervised learning ์„ ์†Œ๊ฐœํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์€ ์ž…๋ ฅ/์ถœ๋ ฅ์˜ ๊ด€๊ณ„๋ฅผ ํ•˜๋‚˜์˜ โ€œlineโ€ ์œผ๋กœ๋งŒ ํ‘œํ˜„ํ•œ๋‹ค. ๋ณธ ์ฑ•ํ„ฐ์—์„œ๋Š” ์ด๋Ÿฌํ•œ โ€œlinesโ€ ๋Š” โ€œpiecewise linear funcsion(์กฐ๊ฐ ์„ ํ˜•ํ•จ์ˆ˜?)โ€ ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๊ณ , ์ด๋“ค์€ ์ž„์˜์˜ ๋ณต์žกํ•œ ๊ณ ์ฐจ์›์˜ ์ž…๋ ฅ/์ถœ๋ ฅ์˜ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ธ๋‹ค.

3.1. Neural network Example

Shallow neural networs ๋Š” multivariate inputs $\bold{x}$๋ฅผ output, $\bold{y}$ ์— ๋งตํ•‘ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ $\phi$ ๋กœ ๊ตฌ์„ฑ๋œ ํ•จ์ˆ˜, $\bold{y=f[x, \phi]}$ ์ด๋‹ค. ์ด๋“ค์˜ ํ’€ ์ •์˜๋Š” 3.4.์—์„œ ํ• ๊ฑฐ๊ณ  ๊ทธ ์ „์— ๋จผ์ € scalar input, output, $x, y$, ๋ฅผ ๋งตํ•‘ํ•˜๋Š” 10๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ, $\{\phi_0, \phi_1, \phi_2, \phi_3, \theta_{10}, \theta_{11}, \theta_{20}, \theta_{21}, \theta_{30}, \theta_{31}\}$ ๋กœ ๊ตฌ์„ฑ๋œ ๋„คํŠธ์›Œํฌ $f[x, \phi]$ ๋ฅผ ํ†ตํ•ด ๋ฉ”์ธ ์•„์ด๋””์–ด๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. ๋จผ์ € ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

Untitled

์œ„ ์‹์˜ ๊ณ„์‚ฐ์„ 3๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ๋ณผ ์ˆ˜์žˆ๋‹ค.

  1. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ 3๊ฐœ์˜ linear functions, $(\theta_{10}+\theta_{11}x), (\theta_{20}+\theta_{21}x), (\theta_{30}+\theta_{31}x)$.

  2. ์œ„์—์„œ ๊ณ„์‚ฐํ•œ 3๊ฐœ์˜ linear function์„ activation function, $a[\bullet]$ ์— ๋จน์ธ๋‹ค.

    ๋‹ค์–‘ํ•œ activation funcionts ์ค‘ ์ผ๋ถ€

    ๋‹ค์–‘ํ•œ activation funcionts ์ค‘ ์ผ๋ถ€

  3. activation function์„ ํ†ต๊ณผํ•œ 3๊ฐœ์˜ ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋“ค์„ $\phi_0, \phi_1, \phi_2, \phi_3$ ๋กœ weighted sum ํ•œ๋‹ค.

activation function์œผ๋กœ๋Š” ๋‹ค์–‘ํ•œ ์„ ํƒ์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ์šฐ๋ฆฌ๋Š” ๊ทธ ์ค‘์—์„œ rectified linear unit, ReLU๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ReLU๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

Untitled

Untitled

ReLU๋Š” input์ด 0๋ณด๋‹ค ์ž‘์œผ๋ฉด 0์„, ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด input ๊ทธ๋Œ€๋กœ๋ฅผ return ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์‹์„ ๋ณด๋ฉด ์–ด๋–ค ์‹์ด (family of equations) ์ž…๋ ฅ/์ถœ๋ ฅ์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š”์ง€ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์–ด๋ ค์šด๋ฐ, ๊ทธ๋ƒฅ ๋ชจ๋“  10๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์— ๋Œ€ํ•œ ์‹, $\bold{\phi}$, ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์ž…๋ ฅ/์ถœ๋ ฅ์˜ ๊ด€๊ณ„๋ฅผ ์ •์˜ํ•œ๋‹ค๊ณ  ์ดํ•ดํ•˜๋„ ์ข‹๋‹ค.

๋งŒ์•ฝ ์šฐ๋ฆฌ๊ฐ€ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด $y$๋ฅผ ์˜ˆ์ธก (inference)ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ฃผ์–ด์ง„ dataset, $\{x_i, y_i\})_{i=1...I}$ ์— ๋Œ€ํ•ด์„œ ํŒŒ๋ผ๋ฏธํ„ฐ $\phi$๊ฐ€ ์ด๋“ค์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ •์˜ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ L2 Loss, $L[\phi]$๋„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฌํ•œ ํ‰๊ฐ€ ์ง€ํ‘œ์— ๋”ฐ๋ผ ์ด๋Ÿฌํ•œ loss๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ, $\hat{\phi}$ ๋„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

3.1.1. Neural Network Intuition

์‚ฌ์‹ค ์ฒซ๋ฒˆ์งธ ์‹์€ ์ตœ๋Œ€ 4๊ฐœ์˜ linear regions ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” continuous piecewise linear functions ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์•„๋ž˜ Figure๋ฅผ ๋ณด๋ผ. (๊ฐ๊ฐ์˜ region์„ ํ•˜๋‚˜์˜ member of family of equations์œผ๋กœ ๋ด๋„ ์ข‹๋‹ค.)

Untitled

์™œ ์ €๋ ‡๊ฒŒ ๋˜๋Š”์ง€ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ฒซ๋ฒˆ์งธ ์‹์„ 2๋‹จ๊ณ„๋กœ ๋‹ค์‹œ ๋‚˜๋ˆˆ๋‹ค. ๋จผ์ €, ์•„๋ž˜์™€ ๊ฐ™์€ ์ค‘๊ฐ„ ๊ฐ’๋“ค์„ ๋จผ์ € ์†Œ๊ฐœํ•œ๋‹ค.

Untitled

์šฐ๋ฆฌ๋Š” $h_1, h_2, h_3$๋ฅผ hidden units ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ๋‘˜์งธ๋กœ ์•ž์„œ ๊ณ„์‚ฐํ•œ hidden units๋“ค์„ line functions๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ์ตœ์ข… ์ถœ๋ ฅ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

Untitled