<a href="https://colab.research.google.com/github/zjyruobing/test/blob/master/pytorch_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PyTorch Tutorial

The goal of this tutorial is to give students a formal introduction to PyTorch and the functionality that the library provides to students seeking to build and test Neural Networks.

## Importing PyTorch

This tutorial assumes a working installation of PyTorch is already available to you. If you don't have one, you should be able to run 'pip install pytorch' if you have a python installation. If you have conda/anaconda installed you can run 'conda install pytorch'.

To get started we simply import torch. 

In [0]:
import torch

In [0]:
print('PyTorch Version:', torch.__version__)


PyTorch Version: 1.4.0


Documentation can be found for PyTorch at [pytorch.org](https://pytorch.org/docs/). 

It can take some time to learn your way around but most things you will use can
be found in either torch, torch.nn, torch.Tensor, and torch.optim. 

Now lets import some other useful tools in the library.

In [0]:
import torch.nn as nn
import torch.optim as optim
import numpy as np

You might have noticed that we imported NumPy as one of the useful libraries. PyTorch shares a lot of functionality with NumPy with quick and easy transitions from torch Tensors to NumPy arrays.

If you have some experience with NumPy you can think of PyTorch as an expansion on top of NumPy that allows you to easily build classifiers without having to manually calculate loss or derivatives.

## Tensors

Tensors are a collection of numbers represented as an array.

 - A scalar is a single number. 
 
 1
 
 - A row vector is a vector $v$ with of dimensions $1 \times d_c$.
 
    [1, 1]
 
 - A column vector is a vector $x$ with dimensions $d_r \times 1$.
 
 [[1],
 
  [1]]
 - A matrix can be thought of as a combination of the above two. The matrix $z$ is $d_r \times d_c$.
 
 [[1, 1],
 
  [1, 1]]

- A tensor $t$ is a cube of dimensionality $d_r \times d_c \times d_d$.

 [[[1, 1], [1, 1]]

 [[1, 1], [1, 1]]]
 
 Vectors, Matrices, and Tensors are the building blocks off all operations that occur within PyTorch.
 The flow of operations and the predictions that you get are the result of you chaining together matrix multiplication.
   

## Creating Tensors

Tensors can be created in a dozens of ways (all of which are listed in the documentation), but we will only be discussing a few.

In [0]:
x = torch.Tensor(2,4)
print(x)
print(x.shape)
print(x.size())

tensor([[2.0533e-36, 0.0000e+00, 4.4842e-44, 0.0000e+00],
        [       nan, 0.0000e+00, 2.1276e+23, 6.7013e-10]])
torch.Size([2, 4])
torch.Size([2, 4])


So we created a tensor, printed its output, and its size. If you run the above command a few times, you may find that your tensor isn't always filled with zeros. Any idea why that might be?

To fix this problem we run the command below:

In [0]:
x.zero_()
x

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]])

Or you can do this:

In [0]:
x = torch.zeros((2,4))
x

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]])

You might have also noticed that we used the .shape or .size() calls to get the shape of the tensor.

When you do this, you'll find that the dimensions are indexed, like in a list, and can be referenced like we do below: 

In [0]:
print(x.shape)
print(x.shape[0])
print(x.shape[1])

torch.Size([2, 4])
2
4


## Data Types in PyTorch

|Type	|Function Call   	|Comments   	|   	|   	|
|---	|---	|---	|---	|---	|
|32-bit Float  	|torch.FloatTensor()   	|Normally used for storing inputs, weights, and gradients. GPUs are optimized for this data type.
|64-bit Float  	|torch.DoubleTensor()   	|Used when single precision floats are too imprecise and round off. Training times are significantly slower|
|16-bit Float  	|torch.HalfTensor()   	|Can be used for storing inputs, weights, and gradients. Suffer from rounding problems. Tend to train faster on GPU than 32-bit floats|
|64-bit Integer	|torch.LongTensor()|Normally used to store Y values for when you calculate loss.	|



Creating a float tensor:

In [0]:
x = torch.FloatTensor(2, 2)
print(x)
print(x.type())
x = torch.Tensor(2, 2)
print(x)
print(x.type())

tensor([[2.0541e-36, 0.0000e+00],
        [0.0000e+00, 0.0000e+00]])
torch.FloatTensor
tensor([[2.0541e-36, 0.0000e+00],
        [0.0000e+00, 0.0000e+00]])
torch.FloatTensor


Creating a long tensor:


In [0]:
x = torch.LongTensor(2, 2)
print(x)
print(x.type())
x = torch.Tensor(2, 2).long()
print(x)
print(x.type())

tensor([[           70172032,                  32],
        [         4294967295, 3474584533648815920]])
torch.LongTensor
tensor([[0, 0],
        [0, 0]])
torch.LongTensor


Remember LongTensors store integers and are used for storing your Y values. That's why the values they store don't have decimals

If we need to get information from a list or a NumPy array, we can easily convert them into torch tensors.

In [0]:
x = [[1,2], [3 ,4]]
print(x)
matrix = np.array(x)
print(matrix)


[[1, 2], [3, 4]]
[[1 2]
 [3 4]]


Of course, we can go directly to a torch tensor from a list too.

In [0]:
x = [[1,2], [3 ,4]]
print(x)
matrix = torch.FloatTensor(x)
print(matrix)

[[1, 2], [3, 4]]
tensor([[1., 2.],
        [3., 4.]])


Here are a few other ways to create tensors:

In [0]:
t_zeros = torch.zeros((2,3))
t_ones = torch.ones(2, 3)            # creates a tensor with 1s
t_fives = torch.empty(2, 3).fill_(5) # creates a non-initialized tensor and fills it with 5
t_random = torch.rand(2, 3)          # creates a uniform random tensor
t_normal = torch.randn(2, 3)         # creates a normal random tensor

print(t_zeros)
print()
print(t_ones)
print()
print(t_fives)
print()
print(t_random)
print()
print(t_normal)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

tensor([[1., 1., 1.],
        [1., 1., 1.]])

tensor([[5., 5., 5.],
        [5., 5., 5.]])

tensor([[0.2184, 0.8291, 0.6408],
        [0.9656, 0.9591, 0.9560]])

tensor([[ 0.4583,  0.8787, -0.0888],
        [-2.5900, -1.5812,  0.1675]])


### Converting back to NumPy and Python Lists
Earlier we saw that we can easily go from a python list or NumPy array to a PyTorch Tensor, but we didn't discuss how to get back.

To go from a tensor back to a NumPy array you can use the below command:

In [0]:
print(matrix.numpy())
print(type(matrix.numpy()))

<class 'torch.Tensor'>
[[1. 2.]
 [3. 4.]]
<class 'numpy.ndarray'>


We can do the same thing but to a list:

In [0]:
print(matrix.tolist())
print(type(matrix.tolist()))


[[1.0, 2.0], [3.0, 4.0]]
<class 'list'>


Notice that when we did our conversion, the dimensions didn't change as we moved from torch to numpy or torch to a python list.


### Reshaping Tensors
We can reshape tensors in a variety of ways to include adding dimensions, removing dimensions, or changing the size of a dimension.
This is particularly useful because some neural network operations expect inputs to be a certain size and will throw errors when the dimensions are incorrect.

In [0]:
x = torch.rand(3,3,3)

In [0]:
x.reshape(3,3,1,3).shape

torch.Size([3, 3, 1, 3])

In [0]:
x.reshape(3,3,1,-1).shape



torch.Size([3, 3, 1, 3])

In [0]:
x.reshape(-1).shape

torch.Size([27])

In one of your assignments this semester you'll use Recurrent Neural Networks to build your system. 

RNNs in PyTorch expect the input to be:

$SentenceLength \times BatchSize \times EmbeddingDims$

So if you have a single sentence of 5 words and your word embeddings are 10 dimensions you'll have a matrix like this:

$5 \times 10$

Since you need a dimension for batch size, you'll want to call .reshape() to make a dimension for $BatchSize$

In [0]:
x = torch.Tensor(5, 10)
print(x.shape)
x = x.reshape(5, 1, 10)
print(x.shape)

torch.Size([5, 10])
torch.Size([5, 1, 10])


Now we can say that its in the format that an RNN can take!

#Optimizers

Pytorch has dozens of built-in optimization functions for updating your weights as you train. These functions are stored in the torch.optim library with full documentation on each. Lets take a look at [Stochastic Gradient Descent](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD).

There are often many different settings that optimizers take as input but we normally use them like this:

```
import torch.optim as optim
model = model() # create the model that we're training
optimizer = optim.SGD(model.parameters(), lr=0.1) # give the optimizer the weights of the model
```

The mathematics behind each optimizer is out of the scope of this tutorial but I'd encourage you to research their differences and understand why some work better than others.


# Loss Functions

If you haven't dealt with loss functions in the past you can think of them as functions that quantify how wrong your algorithm is compared to what it should predict.

Sum-of-Squares, or Mean Squared Error, is often used with Linear Regression. This loss is defined as such:

$ \sum_{i=1}^{n} (y_i-\hat{y}_i)^2$

In PyTorch we can use the torch.nn.MSELoss() function instead.

In [0]:
y = torch.LongTensor([2])
yhat = torch.FloatTensor([0])
loss = nn.MSELoss()

print(loss(y,yhat))

tensor(4.)


Of course, we aren't limited to Sum-of-Squares error. There are tons that we have available in PyTorch's library.

While Sum-of-Squares is good for regression, most of your work in this class will be multi-class classification problems.

Cross Entropy Loss (torch.nn.CrossEntropyLoss()) will be a good loss function to use for this problem.

In [0]:
y = torch.LongTensor([1])
print(y.shape)
yhat = torch.FloatTensor([[1,0]])
print(yhat.shape)

# in_ = torch.randn(3, 5, requires_grad=True)
# target = torch.empty(3, dtype=torch.long).random_(5)
loss = nn.CrossEntropyLoss()

print(loss(yhat,y))

torch.Size([1])
torch.Size([1, 2])
tensor(1.3133)


**Note:** Pay close attention to the order that you provide y and yhat to the loss function. Some functions use y, yhat while other take yhat, y.

# Neural Networks
At long last, we are finally talking about building neural networks!

All models/networks in PyTorch will have the nn.Module object as its parent class. This provides your model with all of PyTorch's built-in math operations.

To build a class you'll want to name it, initialize all layers in the __init__() function and then define your algorithm in the forward() function.

A model for Logistic Regression can be found below:

In [0]:
class LogisticRegression(nn.Module):
    def __init__(self, dims):
        super(LogisticRegression, self).__init__()
        self.L1 = nn.Linear(dims,1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        z1 = self.L1(x)
        a1 = self.sigmoid(z1)
        return a1

To use this network, we initialize it and call the forward function with a tensor as input

In [0]:
model = LogisticRegression(10)
x = torch.rand(1,10)

yhat = model.forward(x)
yhat

tensor([[0.5566]], grad_fn=<SigmoidBackward>)

A Feed Forward Neural Network can be defined very similarly:

In [0]:
class FeedForwardNeuralNetwork(nn.Module):
    def __init__(self, dims):
        super(FeedForwardNeuralNetwork, self).__init__()
        self.L1 = nn.Linear(dims,5)
        self.L2 = nn.Linear(5, 3)
        self.L3 = nn.Linear(3, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        a1 = self.sigmoid(self.L1(x))
        a2 = self.sigmoid(self.L2(a1))
        a3 = self.sigmoid(self.L3(a2))
        return a3

In [0]:
model = FeedForwardNeuralNetwork(10)
x = torch.rand(1,10)

yhat = model.forward(x)
yhat

tensor([[0.3282]], grad_fn=<SigmoidBackward>)

As well as a Recurrent Neural Network:

In [0]:
class RecurrentNeuralNetwork(nn.Module):
    def __init__(self, embedding_dim):
        super(RecurrentNeuralNetwork, self).__init__()
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=10) # Many other kwargs are available for nn.RNN. Number of layers, which activation function, dropout, uni or bidirectional
        self.linear = nn.Linear(10, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Remember, RNNs assume the input has this shape: [sequence_length, batch_size, embedding_dim]
        out, _ = self.rnn(x)
        return self.sigmoid(self.linear(out[0,0,:]))  # Use Logistic Regression to get a probability

# Document Classification Using Logistic Regression

First, we import all of the libraries that we are going to need

In [0]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.feature_extraction.text import CountVectorizer # This is to create a Bag-of-Words representation for each sentence

Next, we create the documents, or sentences, that we're going to train on. Then convert them to their Bag-of-Words representation.

In [0]:
sentences = ['i love my cat',
             'dogs are the best',
             'my cat is always knocking things over',
             'dogs are man\'s best friend']

vectorizer = CountVectorizer()
bow = vectorizer.fit_transform(sentences)
x = bow.toarray()

x = torch.Tensor(x).cuda()  # Tensor of shape 4x14
y = torch.Tensor([[1], [0], [1], [0]]).cuda()  # Tensor of shape 4x1

In [0]:
print(vectorizer.get_feature_names())
print(x)

['always', 'are', 'best', 'cat', 'dogs', 'friend', 'is', 'knocking', 'love', 'man', 'my', 'over', 'the', 'things']
tensor([[0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0.],
        [0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [1., 0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 1.],
        [0., 1., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0.]],
       device='cuda:0')


Now we create our Neural Network

In [0]:
class FeedForwardNeuralNetwork(nn.Module):
    def __init__(self, dims):
        super(FeedForwardNeuralNetwork, self).__init__()
        self.L1 = nn.Linear(dims,1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        z1 = self.L1(x)
        a1 = self.sigmoid(z1)
        return a1

Lastly, our training loop for training our system.

In [0]:
model = FeedForwardNeuralNetwork(x.shape[1])  # Logistic Regression with x.shape[1] dimensions
model.to('cuda')
optimizer = optim.SGD(model.parameters(), lr=0.1)  # Optimizing with Stochastic Gradient Descent
loss = nn.BCELoss()  # Binary Cross Entropy Loss
for epoch in range(1000):  # Training Loop
    model.zero_grad()  # Zero all the Gradients
    yhat = model.forward(x)  # Compute forward pass
    output = loss(yhat, y)  # Compute loss
    output.backward()  # Back propagate loss
    optimizer.step()  # Update weights

In [0]:
print(model.forward(x))
print(vectorizer.get_feature_names())
print(model.L1.weight)

tensor([[0.9894],
        [0.0066],
        [0.9971],
        [0.0049]], device='cuda:0', grad_fn=<SigmoidBackward>)
['always', 'are', 'best', 'cat', 'dogs', 'friend', 'is', 'knocking', 'love', 'man', 'my', 'over', 'the', 'things']
Parameter containing:
tensor([[ 0.5516, -1.4384, -1.4364,  1.4082, -1.3732, -0.6979,  0.3297,  0.4618,
          1.0960, -0.5583,  1.8470,  0.5741, -0.9420,  0.4994]],
       device='cuda:0', requires_grad=True)
