<a href="https://colab.research.google.com/github/zoldello/pytorch-presentation-nov-2018/blob/master/presentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---



## Philip Adenekan 
## Basic Deep Neural Network with Pytorch
## Cherry Lab Dev Meeting
## Cherry Lab
## November, 2018


---



In [0]:
# IGNORE. COLLAB INITIALIZATION CODE 
# http://pytorch.org/
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision



---


# Overview 
- What is Pytorch?
- Levels of Abstraction
    - Tensors
    - Variable
    - Module
- MNIST dataset
- Artificial Neural Network
- Autograd


---





---


# Pytorch
It’s a **Python-based scientific computing package** targeted at two sets of audiences:
1.   A replacement for NumPy with added GPU support 
2.   A deep learning research platform that provides maximum flexibility and speed

**Presentation based on v0.41; v1 is in preview.**


---







---


# More on Pytorch
- Facebook's Python-port of Torch (Torch written in Lua) 

- Users: Facebook, Uber,  Tesla, some labs in Stanford and many others in industry and academia

- Andrej Karpathy quote- "*... using Pytorch... I've never felt better. I have more energy. My skin is clearer. My eye sight  has improved*" [4]


---






---


# Three Levels of Abstration
- **Tensor**: ndarray that runs can runu on GPU
- **Variable**: Store data and gradient
- **Module**: Neural Network functionality like stores weights 

---





---
#  Tensor

- What is a Tensor
    - Simple Definition: **An array/list of n rows and m column**
    - Definition in a Mathematics:  A tensor is an arbitrarily complex geometric object that maps in a (multi-)linear manner geometric vectors, scalars, and other tensors to a resulting tensor. 
    
    **    Lets stick with the simple defintion**

- Nomenclature of a group of number in Computer Science vs Mathematics:
<table>
    <th>
        <td></td>
        <td>**Number**<td>
         <td>**1 X n Numbers **<td>
                <td>**n X m Numbers**<td>
    </th>
    <tr>
         <td>Computer Scientist<td>
         <td>int<td>
                <td>array<td>
            <td>nd-array<td>
    <tr>
            <tr>
         <td>Mathematicians<td>
         <td>scalar<td>
                <td>vector<td>
            <td><b>tensor</b><td>
    <tr>
   </table>

- **Tensor** is heavily used in Pytorch.
- Tensor  used to model scalar and vectors (okay in Machine Learning)
                
- Slew of operations possible- Transport, arthmetic, etc

               

---



In [0]:
import torch # pytorch library
from torch import autograd 
from torch.autograd import Variable




---
# Pytorch Tensors

- Creating a Tensor


```
x = torch.tensor([1,2]) # 1-D tensor, vector
x = torch.tensor([[1,2], [3,4]]) # 2-D tensor, matrix
 

```

Many other means to create a tensor:
- x = torch.rand(1,2)
- x =  torch.zeros(1,2)
- x = torch.ones(1,2)
- x = torch.Tensor(1,2)
- x = torch.FloatTensor([1,2,3])
- x = torch.DoubleTensor([1,2,3])

---



In [3]:
# building a tensor
x = torch.tensor([[1.0,2.], [3.,4.]])
print(f'Tensor: {x}')
print(f'Type: {type(x)}')
print(f'DType: {x.dtype}')
print(f'Mean: {x.mean()}')



Tensor: tensor([[1., 2.],
        [3., 4.]])
Type: <class 'torch.Tensor'>
DType: torch.float32
Mean: 2.5


In [4]:
# tensor information
x = torch.tensor([1,2])
print(f'Shape: {x.shape}') # count of rows and columns
print(f'Size : {x.size()}') # equivalent to shape
print(f'Dimension: {x.dim()}')

Shape: torch.Size([2])
Size : torch.Size([2])
Dimension: 1





```





---


# Variables 
- ### Variable deprecated in v1 pre-release, just autograd needed 
- Tensor wrapper for storing gradient(stores operations performed on said tensor)
- Three componenets
    - data; Retrieve tensor
    - grad: gradient
    - grad_fn: function object for creating variable





---





In [5]:
# creating a Variable
y = Variable(torch.tensor([1.0,2.0], requires_grad=True)) # tensor must be a float
print(y)

tensor([1., 2.])




---
# Module
- Contains methods for building various types of Neural Networks
    -  ### Multi-layer perceptron
    - Convolutional Neural Network
    - Recurrent Neural Networks


---





---

#MNIST Dataset
- Collection of handwritten number from 0 - 9 [12], [13]
- Stored in 28 X 28 array of grayscale color
- Used as a base test in Machine Learning
- Sample: 
![alt text](http://ml4a.github.io/images/figures/mnist-input.png)

---





---

# Artifical Neural Network 
- Collection of nodes loosely based biological brain

- Purpose is to model non-linear equation

- Needs training

- Has three layers
    - Input 
    - Hidden 
    - Output

- Illustration: 

![alt text](https://i.stack.imgur.com/gzrsx.png)


** Y = ReLu( (x1 * w1 + x2 * w2 + x3 * w3)  ) **


Ouput as a Tensor:
** y = X * W **

Fig 1: A Simple Neural Network [12]
---





---


- Activation function
    - Introduces non-linearity
    - Determines if a node fires or not
    - Examples: ReLU, Sigmoid, Tanh
    - ReLU: 0 if value is <=0, else return value

- Bias: Optional values added to a node, can ensure a node weighted sum is not zero


![Neural Network](https://cdn-images-1.medium.com/max/1200/1*0NKtEk20-qnaLkwOa8DlnA.png)
Fig 2. More detailed image of a Neural Network [5]

- **Input, weights and outputs of nodes can be treated as Tensor**
---



# Learning in Neural Network

- Goal is to set weight so that the eror (difference between real output and neural network) is minimized

- This can be considered an convex optimization problem (see [11] for details)
 

![Neural Network error vs weight](http://www.cs.cornell.edu/boom/2004sp/projectarch/appofneuralnetworkcrystallography/images/NeuralNetworkErrorCurve.jpg)


- Learning is about converging/moving towards to a **global minimum**

- Learning rate is speed at which this takes place
    - Too low and it takes too long to converge
    - Too high and you get overshooting

- Note: Given there are multiple weights, the graph above is overly simplied



---


# Deep Neural Networks

- Has large number of nodes

- Better at modeling complex equation than shallow Neural Network

- Exploded-popularity due to great results
    - In some cases, better at image classification than humans [6]
    - Defeated best go-players [7]
    - Gaining popularity in biomedical imaging [8]
    - Widely used in autonomous vehicles [9]
   

---





---


# Deep Neural Networks Not a Magic Bullet
- Need lots of data (Thousands of training-samples is nice)

- Even with GPU, can take days to train

- Other solution more effective in small-class problems like linear problems

- Current a black box (albeit research is ongoing to change this and technically convolutional neural networks have some interpretation)


---





---


# Pytorch-Neural Network Setup
- Layers
-  Forward Propagation
- Training
- Validation
- Quantification with test data (beyond scope)

- This demo will be based on work in [10]


---



In [0]:
import torch.nn as nn # deep learning class
import torchvision.datasets as dsets # MNIST data (repository of handwriting samples for image analysis [28 X 28 matrix])
import torchvision.transforms as transforms
from torch.autograd import Variable # autograd

In [7]:
# accessing MNIST data (pytorch comes with a sample along with Iris [flowers])
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


In [0]:
# Hyperparameters- Things you can adjust to better train network 
batch_size = 100       # Size of input data used for one iteration
input_size = 784       # The image size = 28 x 28 = 784
hidden_size = 500      # Number of nodes in the hidden layer
num_classes = 10       # Number of output classes. In this case, from 0 to 9
num_epochs = 5         # Number of times entire dataset is trained
learning_rate = 0.001  # The speed of convergence
        


In [0]:
#data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


In [0]:
import torch.nn as nn # base class for a deep neural network
import torch.nn.functional as F

class Net(nn.Module): # Module, one of the three levels of Pytorch
    def __init__(self):
        super(Net, self).__init__()
        # setting up Deep Neural Network
        self.fc1 = nn.Linear(input_size, hidden_size) # first hidden layer, input=784, output=500
        self.fc2 = nn.Linear(hidden_size, num_classes) # second hidden layer, input=500, output=10
        self.activation_fn = nn.ReLU()
        
    # forward propagation
    def forward(self, x):
        out = self.fc1(x)
        out = self.activation_fn(out) # ReLU activation function on first layer
        out = (self.fc2(out)) # ReLU on second hidden layer 
        return out


In [15]:
# lets preview the network
net = Net()
print(net)

Net(
  (fc1): Linear(in_features=784, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=10, bias=True)
  (activation_fn): ReLU()
)


In [0]:
# run GPU iff available
if torch.cuda.is_available():
    net.cuda()

In [0]:

criterion = nn.CrossEntropyLoss() # loss function used for determining error
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate) # way of updating weight

In [14]:
# training


for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        labels = Variable(labels)
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            print(f'Epoch: [{epoch+1}/{num_epochs}], Step: [{i+1}/{len(train_dataset)//batch_size}], Loss: [{loss.data[0]}]')

  from ipykernel import kernelapp as app


Epoch: [1/5], Step: [100/600], Loss: [0.479817658662796]
Epoch: [1/5], Step: [200/600], Loss: [0.2667689621448517]
Epoch: [1/5], Step: [300/600], Loss: [0.2991608679294586]
Epoch: [1/5], Step: [400/600], Loss: [0.4048100709915161]
Epoch: [1/5], Step: [500/600], Loss: [0.3851563334465027]
Epoch: [1/5], Step: [600/600], Loss: [0.09433604031801224]
Epoch: [2/5], Step: [100/600], Loss: [0.16431690752506256]
Epoch: [2/5], Step: [200/600], Loss: [0.10400703549385071]
Epoch: [2/5], Step: [300/600], Loss: [0.1864813268184662]
Epoch: [2/5], Step: [400/600], Loss: [0.09160412847995758]
Epoch: [2/5], Step: [500/600], Loss: [0.0849151685833931]
Epoch: [2/5], Step: [600/600], Loss: [0.10740596801042557]
Epoch: [3/5], Step: [100/600], Loss: [0.11431995034217834]
Epoch: [3/5], Step: [200/600], Loss: [0.10467308014631271]
Epoch: [3/5], Step: [300/600], Loss: [0.09521561861038208]
Epoch: [3/5], Step: [400/600], Loss: [0.10452988743782043]
Epoch: [3/5], Step: [500/600], Loss: [0.2231658548116684]
Epoch:

#References


1.  Pytorch documentation (stable-v0.41): https://pytorch.org/docs/stable/index.html

- Pytorch documentaiotn (preview- v1.0): https://pytorch.org/docs/master/

- Variable Deprecation: https://pytorch.org/docs/stable/autograd.html#variable-deprecated

-  Andrej Karpathy quote:
https://twitter.com/karpathy/status/868178954032513024

- Image of Neural Network: https://cdn-images-https://hackernoon.com/everything-you-need-to-know-about-neural-networks-8988c3ee4491

- 6 areas where artificial neural networks outperform humans: https://venturebeat.com/2017/12/08/6-areas-where-artificial-neural-networks-outperform-humans/

- AlphaGo: https://en.wikipedia.org/wiki/AlphaGo

- Deep Learning Applicatins in Medical Imaging: https://www.techemergence.com/deep-learning-applications-in-medical-imaging/ 

- How important is deep learning in autonmous driving?- https://www.quora.com/How-important-is-deep-learning-in-autonomous-driving

- A Simple Starter Guide to Build a Neural Network:  https://towardsdatascience.com/a-simple-starter-guide-to-build-a-neural-network-3c2cf07b8d7c

- Gradient descent, how neural networks learn | Deep learning, chapter 2:  https://www.youtube.com/watch?v=IHZwWFHWa-w&t=1063s

- MNIST: http://yann.lecun.com/exdb/mnist/

- MNIST Wiki- https://en.wikipedia.org/wiki/MNIST_database

# Resources

## Primary Documentation
- v0.41. (stable): https://pytorch.org/docs/stable/index.html
- v1.02: https://pytorch.org/docs/master/

## YouTube

- Deep Learning with PyTorch: Building a Simple Neural Network| packtpub.com: https://www.youtube.com/watch?v=VZyTt1FvmfU

- Deep Lizard Training in Pytorch: https://www.youtube.com/watch?v=v5cngxo4mIg&list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG

- 3Blue1Brown (Lot of information about Mathematics relevant to Machine Learning): https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw


- Siraj Raval (Produces a lot of videos explain Machine Learning): https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A


## Online Conference
-PyTorch developer conference part 1: https://www.youtube.com/watch?v=KJAnSyB6mME&t=4593s

- PyTorch developer conference part 2: https://www.youtube.com/watch?v=8881p8p3Guk&t=90s

- PyTorch developer conference part 3: https://www.youtube.com/watch?v=JVT4XvixNvs



## Books
- Deep Leaning by GoodFellow et al - http://www.deeplearningbook.org



There are a lot of blogs and videos on Machine Learning. 