搞懂Back Propagation的微分推導過程
Back Propagation
假設一個簡單兩顆神經元$x_{1}, x_{2}$的神經網路,對於每顆神經元的更新$\frac{ \partial C}{ \partial w_{i}}$,可以分為Forward Pass
及Backward Pass
兩個步驟,Forward Pass
非常容易計算,而Backward
則是須從最後一層推回來,故得其名
\[Y = \sigma (Z)\]
\[Z = WX + B = w_{1}x_{1} + w_{2}x_{2} + b\]
\[w \textit{更新權重: } \frac{ \partial z}{ \partial w} \frac{ \partial C}{ \partial z}\]
\[\textit{Forward Pass: } \frac{ \partial z}{ \partial w} = x\]
\[\textit{Backward Pass: } = \frac{ \partial C}{ \partial z} = \frac{ \partial y}{ \partial z} \frac{ \partial C}{ \partial y} =
{ \sigma(z) }' \frac{ \partial C}{ \partial y}\]
\[{ \sigma(z) }' = ({ \frac{1}{1 + e^{-z}} })' = \frac{e^{-z}}{(1+e^{-z})^{2}} = \sigma(z) (1 - \sigma(z))\]
\[\frac{ \partial C}{ \partial y} = \frac{ \partial Loss }{ \partial y}\]
Loss Function
根據不同Loss Function
,$ \frac{ \partial C}{ \partial y} $的形式有所不同,也會影響模型參數訓練的過程
\[\textit{Cross Entropy} = \frac{1}{N} \sum_{n=1}^{N} \hat{y}^{(n)} \ln y^{(n)} + (1 - \hat{y}^{(n)} ) \ln (1 - y^{(n)} )\]
\[\frac{ \partial C}{ \partial y^{(i)}} = \frac{1}{N} (- \frac{\hat{y}^{(i)}}{y^{(i)}} + \frac{1 - \hat{y}^{(i)}}{1 - y^{(i)}})\]
Reference
- BP Algorithm的理解思路
- Cross Entropy 微分推導
- 李宏毅老師 - Back Propagation