## sigmoid/logistic

### $f(x) = \sigma = \frac{1}{1+e^{-x}}$

图像：  
![image.png](attachment:image.png)

![image.png](attachment:image.png)

用途广泛，但是会出现梯度离散现象，即当无限趋向于正无穷  则$\sigma$会等于0，这根据上面的$\dot{\delta}=\delta(1-\delta)$公式可得$\dot{\sigma}=\sigma$，所以会出现$\sigma$长时间得不到更新


In [6]:
import torch

In [2]:
a = torch.linspace(-100,100,10)

In [3]:
a

tensor([-100.0000,  -77.7778,  -55.5556,  -33.3333,  -11.1111,   11.1111,
          33.3333,   55.5556,   77.7778,  100.0000])

In [4]:
torch.sigmoid(a)

tensor([0.0000e+00, 1.6655e-34, 7.4564e-25, 3.3382e-15, 1.4945e-05, 9.9999e-01,
        1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00])

## tanh

### $f(x)=tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}= 2sigmoid(2x)-1$

图像：
![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [5]:
a = torch.linspace(-1,1,10)

In [6]:
torch.tanh(a)

tensor([-0.7616, -0.6514, -0.5047, -0.3215, -0.1107,  0.1107,  0.3215,  0.5047,
         0.6514,  0.7616])

### ReLU(rectified linear unit)

$$
f(x)= \begin{cases}
0, & \text {$x\le$  0} \\
x, & \text {$x \geq$ 0}
\end{cases}
$$

图像：
![image.png](attachment:image.png)

In [7]:
from torch.nn import functional as F

In [8]:
a = torch.linspace(-1,1,10)

In [9]:
torch.relu(a)

tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1111, 0.3333, 0.5556, 0.7778,
        1.0000])

In [10]:
F.relu(a)

tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1111, 0.3333, 0.5556, 0.7778,
        1.0000])

## typical loss

### Mean Squared Error (MSE 均方差)

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)是第i个样本的预测值，  
![image.png](attachment:image.png)是第i个样本的真实值

* $loss = \sum [y-(xw+b)]^2$
* $L2-norm = ||y-(xw+b)||_2 $
* $loss = norm(y-(xw+b))^2$
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

### autograd.grad

In [11]:
from torch.nn import functional as F

In [7]:
x = torch.ones(1)  #pred = x*w+b  这里 x=1  w=2   b=0

In [16]:
w = torch.full([1],2.)

In [17]:
mse = F.mse_loss(x*w,torch.ones(1)) #参数1是pred  参数2是label(真实值)

In [18]:
mse  #mse = (1-2)^2   loss = (pre-label)^2

tensor(1.)

In [14]:
torch.autograd.grad(mse,[w]) #没有对w变量进行设置，告知w变量可以计算梯度

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

![image.png](attachment:image.png)

In [19]:
w.requires_grad_()

tensor([2.], requires_grad=True)

In [23]:
torch.autograd.grad(mse,[w])  #因为pytorch是动态图，虽然我们更新了w,而上面的图还没有进行更新

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.

In [25]:
mse = F.mse_loss(x*w,torch.ones(1))

In [26]:
torch.autograd.grad(mse,[w]) 

(tensor([2.]),)

推导过程：
 
$pre = x*w + b$  
$loss = (pre - laber)^2 = (2*1-1)^2$  
$ \frac{\partial l }{\partial w} = 2*(x*w+b -y)* \frac{\partial xw}{\partial w} $  
$=2*(2-1)*1 = 2$
    

### loss.backward

In [28]:
x = torch.ones(1)

In [33]:
w = torch.full([1],2.)

In [36]:
w.requires_grad_()

tensor([2.], requires_grad=True)

In [37]:
mse = F.mse_loss(x*w,torch.ones(1))

In [38]:
mse

tensor(1., grad_fn=<MseLossBackward>)

In [39]:
mse.backward()  
#反向传播 从后往前传播，完成这条路劲中所有要梯度的tensor的grad计算方法，计算出来的值存放到该tensor的grad属性中

In [40]:
w.grad

tensor([2.])

### softmax(激活函数)

![image.png](attachment:image.png)

激活函数梯度推导过程：
![image.png](attachment:image.png)  
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

### F.softmax

In [41]:
a = torch.rand(3)

In [42]:
a

tensor([0.7066, 0.4845, 0.3603])

In [43]:
a.requires_grad_()

tensor([0.7066, 0.4845, 0.3603], requires_grad=True)

In [60]:
p = F.softmax(a,dim=0)

In [61]:
p

tensor([0.3987, 0.3193, 0.2820], grad_fn=<SoftmaxBackward>)

In [62]:
torch.autograd.grad(p[1],[a],retain_graph=True) 
#表示对p中的第二个变量(i=1)对aj的偏微分  当p的下标i和a的下标j相等时，他会得到正值，否则为负值

(tensor([-0.1273,  0.2173, -0.0900]),)

In [63]:
torch.autograd.grad(p[2],[a]) #对p中的第三个变量对aj的偏微分 

(tensor([-0.1124, -0.0900,  0.2025]),)

### Cross Entropy Loss(交叉熵法)

* binary
* multi-class
* +softmax  和softmax激活函数搭配使用
* leave it to logistic regression part