# Softmax Cross Entropy求导
* 思路：
    1. 确定预测的值Y是1x3维度，确定标签T是1x3维度。根据此描述对loss进行标量式推导，然后推广到更多情况
    2. softmax函数，由于分母是sum（所有元素），所以对每个元素求导时，都要加上其他元素上的导数

##
## 考虑$
    Y = [y_1, y_2, y_3] \\
    S = e^{y_1} + e^{y_2} + e^{y_3}
$

##
## $
\begin{align*}
P & = softmax(Y) \\
  & = [P_1, P_2, P_3]  \\
  & = [e^{y_1} \cdot \frac{1}{S}, e^{y_2} \cdot \frac{1}{S}, e^{y_3} \cdot \frac{1}{S}]
\end{align*}
$

## 也因此有Loss定义为：
### 标签的定义为：$T = [T_1, T_2, T_3]$，比如是onehot，label=2，则$T=[0, 0, 1]$
### Loss的定义为：$
\begin{align*}
L & = -\frac{1}{M}\sum_{i=1}^{M}\sum_{k=1}^{3}{T_k^{(i)} \ln(P_k^{(i)})} \\
  & = -\frac{1}{M}[\ln(P_1) T_1 + \ln(P_2) T_2 + \ln(P_3) T_3] \\
\end{align*}
$

# 计算y1导数
### 考虑$
\begin{align*}
\frac{\delta L}{\delta y_1} = \frac{\delta L}{\delta P_1} \cdot \frac{\delta P_1}{\delta y_1} + \frac{\delta L}{\delta P_2} \cdot \frac{\delta P_2}{\delta y_1} + \frac{\delta L}{\delta P_3} \cdot \frac{\delta P_3}{\delta y_1}
\end{align*}
$

### 
### 这里1 $\frac{\delta ln(x)}{\delta x} = \frac{1}{x}$
### 这里2 $\frac{\delta e^x}{\delta x} = e^x$
### 这里3 $\frac{\delta x^{-1}}{\delta x} = -\frac{1}{x^2}$
### 这里4 $P_1 = e^{y_1} \cdot \frac{1}{S}$
### 计算$\frac{\delta L}{\delta P_1} \cdot \frac{\delta P_1}{\delta y_1}$部分 $
\begin{align*}
\frac{\delta L}{\delta P_1} & = - T_1 \times \frac{1}{P_1} \\
\frac{\delta P_1}{\delta y_1} & = \frac{\delta P_1}{\delta e^{y_1}}\frac{\delta e^{y_1}}{y_1} + \frac{\delta P_1}{\delta S}\frac{\delta S}{y_1} \\
                              & = \frac{1}{S} \times e^{y_1} +  e^{y_1} \times \frac{-1}{S^2} \times e^{y_1} \\
                              & = P_1 - P_1^2 \\
                              & = P_1(1 - P_1)
\end{align*}
$

### 
### 因此得$
\begin{align*}
\frac{\delta L}{\delta P_1} \cdot \frac{\delta P_1}{\delta y_1} & = -T_1 \times \frac{1}{P_1} \times P_1(1 - P_1) \\
                                                                & = T_1 \times (P_1 - 1)
\end{align*}
$

---

### 
### 这里1 $P_2 = e^{y_2} \cdot \frac{1}{S}$
### 计算$\frac{\delta L}{\delta P_2} \cdot \frac{\delta P_2}{\delta y_1}$部分 $
\begin{align*}
\frac{\delta L}{\delta P_2} & = -T_2 \times \frac{1}{P_2} \\
\frac{\delta P_2}{\delta y_1} & = \frac{\delta P_2}{\delta S}\frac{\delta S}{y_1} \\
                              & = e^{y_2} \times \frac{-1}{S^2} \times e^{y_1} \\
                              & = -P_2 P_1
\end{align*}
$

### 
### 因此得$
\begin{align*}
\frac{\delta L}{\delta P_2} \cdot \frac{\delta P_2}{\delta y_1} & = -T_2 \times \frac{1}{P_2} \times -P_2 P_1 \\
                                                                & = T_2 P_1
\end{align*}
$

### 
### 因此得$
\begin{align*}
\frac{\delta L}{\delta P_3} \cdot \frac{\delta P_3}{\delta y_1} & = -T_3 \times \frac{1}{P_3} \times -P_3 P_1 \\
                                                                & = T_3 P_1
\end{align*}
$

# 总结y1偏导
### 注意这里：$T_1 + T_2 + T_3 = 1$认为所有标签的概率和为1
### 得到最终导数值$
\begin{align*}
\frac{\delta L}{\delta y_1} & = \frac{\delta L}{\delta P_1} \cdot \frac{\delta P_1}{\delta y_1} + \frac{\delta L}{\delta P_2} \cdot \frac{\delta P_2}{\delta y_1} + \frac{\delta L}{\delta P_3} \cdot \frac{\delta P_3}{\delta y_1}  \\
                            & = T_1 \times (P_1 - 1) + T_2 P_1 + T_3 P_1 \\
                            & = T_1 P_1 - T_1 + T_2 P_1 + T_3 P_1 \\
                            & = P_1(T_1 + T_2 + T_3) - T_1 \\
                            & = P_1 - T_1 \\
算上Batch后：& = \frac{1}{M} [P_1 - T_1] \\
\end{align*}
$

# 以此推算，y1偏导、y2偏导、y3偏导

## 
## $
\begin{align*}
\frac{\delta L}{\delta y} & = \frac{1}{batch\_size}(P - T) \\
                          & = \frac{1}{batch\_size}(softmax(y) - T)
\end{align*}
$