2021 Takahiro Shinozaki @ Tokyo Tech

Quick introduction of neural networks

References:

    https://pytorch.org/docs/stable/nn.html#loss-functions




In [None]:
import torch
import torch.nn as nn
# Install torchviz to visualize the network structure
! pip install torchviz
from torchviz import make_dot 

# Define a linear layer

In [None]:
# Linear layer (ll)
ll =  nn.Linear(2, 2)
print('type(ll) =', type(ll))
# parameters are randomly initialized
print('weight= ', ll.weight)
print('bias= ', ll.bias)

Example affine transformation using the linear layer

$
y = \mathrm{ll}(x) =
\left[\begin{array}{cc}
1 & 0 \\
0 & 2 \\
\end{array}\right]
\left[\begin{array}{c}
1 \\
1 \\
\end{array}\right]
+
\left[\begin{array}{c}
10 \\
-10 \\
\end{array}\right]
=
\left[\begin{array}{c}
11 \\
-8 \\
\end{array}\right]
$

In [None]:
# Explicitly setting the parameter values
ll.weight = nn.Parameter(torch.tensor([[ 1.0, 0.0],[0.0, 2.0]]))
print('weight= ', ll.weight)
ll.bias = nn.Parameter(torch.tensor([10.0, -10.0]))
print('bias= ', ll.bias)

x = torch.tensor([1.0, 1.0])
print('x= ', x)
y = ll(x)
print('y =', y)

# Introduce activation functions

In [None]:
# an example input (two samples of three dimensinal vectors)
x = torch.randn(2, 3)

# sigmoid activation function
h = nn.Sigmoid()
print(type(h))
# apply h to an input x
print('sigmoid output =', h(x))

# softmax activation function
h = nn.Softmax(1) # 1 means 1-dimensional softmax
print(type(h))
# apply h to an input x
print('sofmax output =', h(x))
# check if the sum of the layer output is 1.0
print('sums of layer outputs =', torch.sum(h(x), 1))

# Define a neural network

In [None]:
######## Advanced topic : __call__ method #############

# nn.Sigmoid() etc. returns a class instance; e.g. h = nn.Sigmoid()
# We use the returned instance as if it is a function; e.g. h(x)
# If we write them in one line, it is: nn.Sigmoid()(x)
# It is based on __call__ special method equipped in python

# Example
class callMethodTest():
    def __init__(self):
        self.coef = 2.5

    # __call___ is a special method
    # By defining this in a class, you can call the instance of this class something like as a function 
    def __call__(self, x):
        return self.coef * x

cmt = callMethodTest() # Make an instance
y = cmt(5.0) # use the instance like a function.
print(y)
###############################################

In [None]:
# Define a neural network as a class
class myNN(nn.Module):
    def __init__(self):
        super(myNN, self).__init__()
        self.layer1 = nn.Linear(3, 4) # input size = 3 and output size = 4
        self.layer2 = nn.Linear(4, 2)

    def forward(self, input):
        l1out = nn.Sigmoid()(self.layer1(input))
        l2out = nn.Softmax(1)(self.layer2(l1out))
        return l2out

# make a class instance
model = myNN()

print('model =', model)

# obtain the model state as a dictionary
params = model.state_dict()
print('params =', params)

# Use the network

## Feed-forward

The network parameters are not trained yet, so the output is random at this point

In [None]:
print(x)
y = model(x)
print(y)

## Network strucure visualization

In [None]:
print(dict(model.named_parameters()))
make_dot(y, params=dict(model.named_parameters()))

## Cross entropy loss calculation

Cross entropy loss: uses cross entropy as a loss

Cross entropy:
$ H(p, q) = \frac{1}{N} \sum_{i=1}^N \sum_{j=1}^C -p_{i,j} \log (q_{i,j})$.

Example:

Consider two three-dimensional vectors 
(e.g. hidden layer outputs of a neural network for two input samples) 
$
Z = 
\left[\begin{array}{c}
Z_1 \\
Z_2 \\
\end{array}\right]
=
\left[\begin{array}{ccc}
z_{1,1} & z_{1,2} & z_{1,3} \\
z_{2,1} & z_{2,2} & z_{3,3} \\
\end{array}\right]
=
\left[\begin{array}{ccc}
1 & -1 & 2 \\
-1 & 0 & 1 \\
\end{array}\right]$.

By applying softmax, we have a categorical distribution $P_i$ for each $Z_i, i \in \left\{1,2\right\}$, where the number of categoreis is three. 

$Q = 
\left[\begin{array}{c}
Q_1 \\
Q_2 \\
\end{array}\right]
=
\left[\begin{array}{ccc}
q_{1,1} & q_{1,2} & q_{1,3} \\
q_{2,1} & q_{2,2} & q_{3,3} \\
\end{array}\right]
=
\mathrm{softmax}(Z)
=
\left[\begin{array}{ccc}
0.2594965 &  0.035119 &  0.7053845 \\
0.0900306 &  0.2447285 &  0.665241 \\
\end{array}\right]
$.

Assume that the reference distribution is $P$.

$P = 
\left[\begin{array}{c}
P_1 \\
P_2 \\
\end{array}\right]
=
\left[\begin{array}{ccc}
p_{1,1} & p_{1,2} & p_{1,3} \\
p_{2,1} & p_{2,2} & p_{3,3} \\
\end{array}\right]
=
\left[\begin{array}{ccc}
0.0 & 0.0 & 1.0 \\
0.0 & 1.0 & 0.0 \\
\end{array}\right]
$.

Then corss-entropy loss $L$ becomes:

$L=\frac{1}{2}
  \sum_{i=1}^2 \sum_{j=1}^3 -p_{i,j} \log (q_{i,j}) = \frac{1}{2}\left\{-\log (q_{1,3}) - \log(q_{2,2})\right\} = \frac{1}{2} \left\{ -\log (0.7053845) - \log(0.2447285) \right\}
  = 0.878309
$.

When the reference is given by indexes of correct catetories
$C=
\left[\begin{array}{c}
c_1 \\
c_2 \\
\end{array}\right]
=
\left[\begin{array}{c}
3 \\
2 \\
\end{array}\right]$,
cross-entropy is obtained by:
$L=\frac{1}{2}
  \sum_{i=1}^2 -p_{i,c_i} \log (q_{i,c_i}) = \frac{1}{2}\left\{-\log (q_{1,3}) - \log(q_{2,2})\right\} = 0.878309
$.

In [None]:
z = torch.tensor([[1, -1, 2], [-1, 0, 1]], dtype=torch.float32)
print(z)
q = nn.Softmax(1)(z)
print(q)
c = torch.tensor([2, 1]) # index starts from 0. therefore, the second entroy is 1 and the third entory is 2.
print(c)
loss = nn.NLLLoss()
loss(torch.log(q),c)

In [None]:
########## Implementation details : nn.CrossEntropyLoss #############
# Pytorch's nn.CrossEntropyLosss applies softmax internally.
# Therefore, if you use nn.CrossEntropyLoss to obtain cross entropy loss, 
# you have to fed the output of your neural network before applying softmax.
loss = nn.CrossEntropyLoss()
loss(z, c)
#####################################################################

## Back-propagation

Obtain gradients of parameters a neural network to minimize cross entropy loss

In [None]:
# an example input (two samples of three dimensinal vectors)
x = torch.randn(2, 3)
## if you want gradients of the input in addition to the gradient of network parameters:
## x = torch.randn(2, 3, requires_grad=True)

# an example reference
c = torch.tensor([1, 0])

# my defined neural network
model = myNN()

# initialize gradient
model.zero_grad()

# obtain cross entropy loss
loss = nn.NLLLoss()
celoss = loss(torch.log(model(x)), c)

# apply back-propagation
celoss.backward() 

# obtained gradients
print(model.layer1.weight.grad)
print(model.layer1.bias.grad)
print(x.grad)