# The material presented in this notebook is for using in Introduction to Deep Learning (CEE 690/ECE 590) course, Duke University, Fall 2019.

## __Modules and packages in Pytorch__
* ### __torch :__  a Tensor library like Numpy, with strong GPU support
* ### __torch.autograd :__ an automatic differentiation library that supports all differentiable Tensor operations in torch
* ### __torch.nn :__ a neural networks library integrated with autograd
* ### __torch.nn.functional :__ implementation of many useful mathematical functions such as Relu, Tanh, and so on.
* ### __torch.optim :__ an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, Adam, and so on.
* ### __torch.multiprocessing :__ python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training.
* ### __torch.utils :__ DataLoader, Trainer and other utility functions for convenience
* ### __torchvision :__ consists of popular datasets, model architectures, and common image transformations for computer vision.

## __Importing modules and packages__

In [1]:
import torch
import torch.nn.functional as F
import torch.nn as nn

## __Diffferent level of abstractions__ 
* ### __Tensor:__ Like array in Numpy, but runs on a GPU to accelerate computing
* ### __Module:__ A neural network layer --> storing states or learnable weights

## __Getting started with Tensors__ 
https://pytorch.org/docs/stable/tensors.html

## Construct a 5x3 matrix, uninitialized:

In [2]:
x = torch.empty(5, 3)

In [3]:
x

tensor([[2.2421e-44, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.8788e+31, 1.7220e+22, 1.9152e+23],
        [1.0489e-08, 1.2858e-11, 1.3370e+22],
        [2.1512e+23, 3.2797e-09, 2.6396e-09]])

## Construct a a randomly initialized $5\times 3$:

In [4]:
x = torch.rand(5, 3)

In [5]:
x

tensor([[0.6432, 0.1583, 0.4193],
        [0.6671, 0.4247, 0.1156],
        [0.4556, 0.1092, 0.2307],
        [0.9508, 0.7426, 0.2145],
        [0.3243, 0.7161, 0.4106]])

## Construct a zero matrix and of dtype long:

In [6]:
x = torch.zeros(5, 3, dtype=torch.long)

In [7]:
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

## Construct a tensor from data (a 3-d tensor):

In [8]:
x = torch.tensor([[[5.5, 3], [-3.8, 100]], [[1, 30], [023.8, 10]]])

In [9]:
x

tensor([[[  5.5000,   3.0000],
         [ -3.8000, 100.0000]],

        [[  1.0000,  30.0000],
         [ 23.8000,  10.0000]]])

## The size of a tensor

In [10]:
x.size()

torch.Size([2, 2, 2])

## Construct a tensor based on an existing tensor. 
###  Reusing the properties of the input tensor, e.g. dtype, unless new values are provided by user

In [11]:
x = torch.randn_like(x, dtype=torch.float) 

In [12]:
x

tensor([[[0.6509, 0.6294],
         [0.4869, 0.9770]],

        [[0.1911, 0.4818],
         [0.9057, 1.3463]]])

# Getting started with Operations

## Adding to two tensors

In [13]:
x = torch.randn(5,3)
y = torch.ones(5,3)
a = x + y
a

tensor([[ 2.2010,  0.0128,  0.2890],
        [ 0.3676,  1.4776, -1.4211],
        [ 2.7436,  1.2102,  2.0648],
        [ 2.4490,  1.4178,  1.1915],
        [ 0.2621,  2.3272,  0.8021]])

## item() to get the value as a Python number

In [14]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([0.7765])
0.7764625549316406


## Converting a Torch Tensor to a NumPy array and vice versa
https://pytorch.org/docs/stable/notes/cuda.html

### From Tensor to Numpy

In [15]:
a = torch.ones(5)
x = a.numpy()

In [16]:
a

tensor([1., 1., 1., 1., 1.])

In [17]:
x

array([1., 1., 1., 1., 1.], dtype=float32)

### From Numpy to Tensor

In [18]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
a

array([1., 1., 1., 1., 1.])

In [19]:
b

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

## Matrix inner product (similar to np.dot() )

In [20]:
a = torch.randn(2,3)
b = torch.randn(3,3)
torch.mm(a,b)

tensor([[-2.3453,  0.7212, -0.0333],
        [-1.5945,  0.3131, -0.3624]])

## Removing dimention with size 1 from a Tensor

In [21]:
a = torch.rand(5,2,1)
print(a.size())
b = torch.squeeze(a, dim= 2)
print(b.size())

torch.Size([5, 2, 1])
torch.Size([5, 2])


## Transfering Tensors to different devices, e.g., GPU, CPU using the .to method.
### To run on GPU, just cast tensors to a cuda data type!

### First check if the CUDA interface exists on your system

In [22]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    dtype = torch.cuda.FloatTensor         # casting tensors to a cuda data type
else:
    device = torch.device("cpu")

### Directly create a tensor on GPU, or use strings ``.to("cuda")``

In [23]:
y = torch.ones_like(b, device=device, dtype=torch.float)  
x = torch.randn(y.size()).to('cuda') # or .to(y.device)

AssertionError: Torch not compiled with CUDA enabled

In [24]:
z = x + y
print(z)

TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

## AUTOGRAD: AUTOMATIC DIFFERENTIATION
https://pytorch.org/docs/stable/autograd.html

* ### Central to all neural networks in PyTorch is the autograd package
* ### Every Tensor has an attribute called __.requires_grad__ which can be True or False
* ### When it is True, starts to track all operations on it
* ### When finishing the computation, one can call __.backward()__ and have all the gradients computed automatically
* ### The gradient for the tensors will be accumulated into __.grad__ attribute

### __To stop a tensor from tracking history:__

* ## First approach:
###    - calling .detach() to detach it from the computation history
###    - It also prevents future computation from being tracked

* ## Second  approach:
###   - wrap the code block using __with torch.no_grad()__ 
###   - Deactivating computation of the gradient for trainable parameters with __requires_grad=True__

## Each tensor has a .grad_fn attribute that references a Function that has created the Tensor 
(grad_fn is None for Tensors created by the user)

In [25]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

### Operating a function on x makes the resulting tensor have __grad_fn__ attribute

In [26]:
y = x + 2
y

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

### __requires_grad_(.)__ changes an existing Tensorâ€™s requires_grad flag in-place. 
### The input flag defaults to False if not given

In [27]:
x = torch.arange(4).float().view(2,2)
print(x)
print(x.requires_grad)

tensor([[0., 1.],
        [2., 3.]])
False


In [28]:
x.requires_grad_(True)
print(x.requires_grad)

True


In [29]:
y = (x * x).sum()
print(b)
print(b.grad_fn)
print(b.requires_grad)

tensor([[0.8298, 0.3202],
        [0.5081, 0.6595],
        [0.4840, 0.5322],
        [0.0459, 0.2831],
        [0.0514, 0.4742]])
None
False


In [30]:
if x.grad is not None:
    x.grad.zero_() # comment this line will accumulate gradients
z = x + 2
print(z)
out = (x * x).sum()
out.backward()
print(out)
print(x.grad)

tensor([[2., 3.],
        [4., 5.]], grad_fn=<AddBackward0>)
tensor(14., grad_fn=<SumBackward0>)
tensor([[0., 2.],
        [4., 6.]])


In [48]:
# to explicitly compute gradients
z = x + 2
print(z)
out = (x * x).sum()
x_grad = torch.autograd.grad(out,x)
print(x_grad)

tensor([[2., 3.],
        [4., 5.]], grad_fn=<AddBackward0>)
(tensor([[0., 2.],
        [4., 6.]]),)
