<a href="https://colab.research.google.com/github/zoldello/pytorch-presentation-nov-2018/blob/master/presentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---



## Philip Adenekan 
## Overview of Pytorch
## Cherry Lab Dev Meeting
## Cherry Lab
## November, 2018


---





---


# Overview 
- What is Pytorch?
-  Mathematics
- Tensors
- Autograd


---





---


# Pytorch
It’s a <u>Python-based scientific computing package</u> targeted at two sets of audiences:
1.   A replacement for NumPy with added GPU support 
2.   A deep learning research platform that provides maximum flexibility and speed

**Presentation based on v0.41; v1 is in preview.**


---







---


# More on Pytorch
- Facebook's Python-port of Torch (Torch written in Lua) 

- Users: Facebook, Uber, Stanford and many others in industry and academia

- Andrej Karpathy quote- "*... using Pytorch... I've never felt better. I have more energy. My skin is clearer. My eye sight  has improved*" [4]


---






---


# Them of  Pytorch Operations
- Heavy focus on **Tensor** manipulation
- Build-in methods to support Tensor operation
    - Arithmetic
    - Neural Network task
    

---





---
# Mathematics 

<table>
    <th>
        <td></td>
        <td>**Number**<td>
         <td>**1 X n Numbers **<td>
                <td>**n X m Numbers**<td>
    </th>
    <tr>
         <td>Computer Scientist<td>
         <td>int<td>
                <td>array<td>
            <td>nd-array<td>
    <tr>
            <tr>
         <td>Mathematicians<td>
         <td>scalar<td>
                <td>vector<td>
            <td>tensor<td>
    <tr>
   </table>

- **Tensor** is commonly used in Pytorch.
- Tensor  used to model scalar and vectors (okay in Machine Learning)
- Slew of operations possible- Transport, arthmetic, etc

               

---





---


# Mathematical Operations to Keep in Mind

- Gradient (just think of it as slope)
- Chain role (Find gradient by finding a gradient of something else firt)


---



In [0]:
# INITIALIZATION CODE NEEDED BY COLLAB
# http://pytorch.org/
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision
import torch
from torch import autograd
from torch.autograd import Variable




---
# Pytorch Tensors

- Creating a Tensor


```
x = torch.tensor([1,2]) # 1-D tensor, vector
x = torch.tensor([[1,2], [3,4]]) # 2-D tensor, matrix
 

```

Many other means to create a tensor:
- x = torch.rand(1,2)
- x =  torch.zeros(1,2)
- x = torch.ones(1,2)
- x = torch.Tensor(1,2)
- x = torch.FloatTensor([1,2,3])
- x = torch.DoubleTensor([1,2,3])

---



In [7]:
# building a tensor
x = torch.tensor([[1.0,2.], [3.,4.]])
print(f'Tensor: {x}')
print(f'Type: {type(x)}')
print(f'DType: {x.dtype}')
print(f'Mean: {x.mean()}')



Tensor: tensor([[1., 2.],
        [3., 4.]])
Type: <class 'torch.Tensor'>
DType: torch.float32
Mean: 2.5


In [3]:
# tensor information
x = torch.tensor([1,2])
print(f'Shape: {x.shape}') # count of rows and columns
print(f'Size : {x.size()}') # equivalent to shape
print(f'Dimension: {x.dim()}')

Shape: torch.Size([2])
Size : torch.Size([2])
Dimension: 1





```





---


# Variables 
- ## Deprecated and no longer necessary in v1 pre-release 
- Many sample code based on v0.41 so good to be aware of it
- Tensor wrapper for storing gradient(stores operations performed on said tensor)
- Three componenets
    - data; Retrieve tensor
    - grad: gradient
    - grad_fn: function object for creating variable
- Used commonly in deep learning




---





In [10]:
# creating a Variable
y = Variable(torch.tensor([1.0,2.0], requires_grad=True)) # tensor must be a float
print(y)

tensor([1., 2.])




---

# Artifical Neural Network 
- Collection of nodes loosely based biological brain

- Purpose is to model non-linear equation

- Needs training

- Has three layers
    - Input
    - Hidden
    - Output

- Illustration:
    - inputs (**X**i) is multiple with weights (wi) and summed
    - The total sum is passed through an activation function
        - ReLU, Sigmoid, tanh
    - Activation function determine output** y**
    - ReLU; 0 if y <=0, else y 

![Neural Network](https://cdn-images-1.medium.com/max/1200/1*0NKtEk20-qnaLkwOa8DlnA.png)
Fig 1. Image of a Neural Network [5]


## Learning: Converge weights so that the error in output from running the Neural Network is very low

---



# Learning

![Neural Network error vs weight](http://www.cs.cornell.edu/boom/2004sp/projectarch/appofneuralnetworkcrystallography/images/NeuralNetworkErrorCurve.jpg)


- Want to be at Global minnum
- Accept with hand-waving. 
- Watch [11] for details



---


# Deep Neural Networks

- Has large number of nodes

- Better at modeling complex equation than shallow Neural Network

- Exploded-popularity due to great results
    - Better at image classification than humans in some cases [6]
    - Beat best go-players [7]
    - Gaining popularity in biomedical imaging [8]
    - Common used in autonomous vehicles [9]
   

---





---


# Deep Neural Networks Not a Magic Bullet
- Need lots of data (100,000 training samples is nice)

- Given with GPU, can take days to train

- Other solution more effective in small-class problems like linear problems

- Current a black box (albeit research is ongoing to change this and technically convolutional neural networks have some interpretation)


---





---
# Deep Neural Network of the Demo [10]

![Neural Network from Adventure in Machine Learning](http://adventuresinmachinelearning.com/wp-content/uploads/2017/07/CNTK-Dense-example-architecture.jpg)


---
Fig 2: A  Deep neural network [10]








---


# Pytorch-Neural Network Setup
- Layers
-  Forward Propagation
- Training
- Validation
- Quantification with test data (beyond scope)

- This demo will be based on work in [10]


---



In [0]:
import torch.nn as nn # deep learning class
import torchvision.datasets as dsets # MNIST data (repository of handwriting samples for image analysis [28 X 28 matrix])
import torchvision.transforms as transforms
from torch.autograd import Variable # autograd

In [5]:
# accessing MNIST data (pytorch comes with a sample along with Iris [flowers])
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


In [0]:
#data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


In [0]:
import torch.nn as nn # base class for a deep neural network
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Hyperparameters- Things you can adjust to better train network 
        input_size = 784       # The image size = 28 x 28 = 784
        hidden_size = 500      # Number of nodes in the hidden layer
        num_classes = 10       # Number of output classes. In this case, from 0 to 9
        num_epochs = 5         # Number of times entire dataset is trained
        batch_size = 100       # Size of input data took for one iteration
        learning_rate = 0.001  # The speed of convergence
        
        # setting up Deep Neural Network
        self.fc1 = nn.Linear(input_size, hidden_size) # first hidden layer, input=784, output=500
        self.fc2 = nn.Linear(hidden_size, num_classes) # second hidden layer, input=500, output=10
        self.activation_fn = nn.ReLU()


In [0]:
# forward propagation
def forward(self, x):
    out = self.fc1(x)
    out = self.activation_fn(out) # ReLU activation function on first layer
    out = (self.fc2(out)) # ReLU on second hidden layer 
    return out

In [25]:
# lets preview the network
net = Net()
print(net)

Net(
  (fc1): Linear(in_features=784, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=10, bias=True)
  (activation_fn): ReLU()
)


In [0]:
# run GPU iff available
if torch.cuda.is_available():
    net.cuda()

In [0]:

criterion = nn.CrossEntropyLoss() # loss function used for determining error
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate) # way of updating weight

In [31]:
# training


for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        labels = Variable(labels)
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))

NotImplementedError: ignored

# Status Report
- Made a Neural Network
- Next Step- Trainingf

In [0]:
from torch import optim


# Speed of learning
# Too high- Fails to converge to a minimum error
# Too low- Takes very long time to learn
# Best to set high initially and gradually decrease
learning_rate = 0.01

# reduce pertubation while learning
momentum = 0.9

# create a stochastic gradient descent optimizer
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=momentum)


# create a loss function
criterion = nn.NLLLoss()

In [20]:
# number of times to go through data
epochs = 1000

# run the main training loop
for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        # resize data from (batch_size, 1, 28, 28) to (batch_size, 28*28)
        data = data.view(-1, 28*28)
        optimizer.zero_grad()
        net_out = net(data)
        loss = criterion(net_out, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(data), len(train_loader.dataset),
                           100. * batch_idx / len(train_loader), loss.data[0]))

NameError: ignored

#References


1.  Pytorch documentation (stable-v0.41): https://pytorch.org/docs/stable/index.html

- Pytorch documentaiotn (preview- v1.0): https://pytorch.org/docs/master/

- Variable Deprecation: https://pytorch.org/docs/stable/autograd.html#variable-deprecated

-  Andrej Karpathy quote:
https://twitter.com/karpathy/status/868178954032513024

- Image of Neural Network: https://cdn-images-https://hackernoon.com/everything-you-need-to-know-about-neural-networks-8988c3ee4491

- 6 areas where artificial neural networks outperform humans: https://venturebeat.com/2017/12/08/6-areas-where-artificial-neural-networks-outperform-humans/

- AlphaGo: https://en.wikipedia.org/wiki/AlphaGo

- Deep Learning Applicatins in Medical Imaging: https://www.techemergence.com/deep-learning-applications-in-medical-imaging/ 

- How important is deep learning in autonmous driving?- https://www.quora.com/How-important-is-deep-learning-in-autonomous-driving

- A PyTorch tutorial – deep learning in Python:  http://adventuresinmachinelearning.com/pytorch-tutorial-deep-learning/

- Gradient descent, how neural networks learn | Deep learning, chapter 2:  https://www.youtube.com/watch?v=IHZwWFHWa-w&t=1063s

# Resources
## YouTube

- 3Blue1Brown (Lot of information about Mathematics relevant to Machine Learning): https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw


- Siraj Raval (Produces a lot of videos explain Machine Learning): https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A


## Books
- Deep Leaning by GoodFellow et al - http://www.deeplearningbook.org



There are a lot of blogs and videos on Machine Learning. 

# Delete prior to presentation

- Explain softmax
- Explain why ReLU is non-linear