# The material presented in this notebook is for using in Introduction to Deep Learning (CEE 690/ECE 590) course, Duke University, Fall 2019.

## __Modules and packages in Pytorch__
* ### __torch :__  a Tensor library like Numpy, with strong GPU support
* ### __torch.autograd :__ an automatic differentiation library that supports all differentiable Tensor operations in torch
* ### __torch.nn :__ a neural networks library integrated with autograd
* ### __torch.nn.functional :__ implementation of many useful mathematical functions such as Relu, Tanh, and so on.
* ### __torch.optim :__ an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, Adam, and so on.
* ### __torch.multiprocessing :__ python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training.
* ### __torch.utils :__ DataLoader, Trainer and other utility functions for convenience
* ### __torchvision :__ consists of popular datasets, model architectures, and common image transformations for computer vision.

## __Importing modules and packages__

In [58]:
import torch
import torch.nn.functional as F
import torch.nn as nn

## __Diffferent level of abstractions__ 
* ### __Tensor:__ Like array in Numpy, but runs on a GPU to accelerate computing
* ### __Module:__ A neural network layer --> storing states or learnable weights

## __Getting started with Tensors__ 
https://pytorch.org/docs/stable/tensors.html

## Construct a 5x3 matrix, uninitialized:

In [144]:
x = torch.empty(5, 3)

In [145]:
x

tensor([[-1.7405e+24,  8.8282e-43, -1.7405e+24],
        [ 8.8282e-43, -1.7405e+24,  8.8282e-43],
        [-1.7405e+24,  8.8282e-43, -1.7405e+24],
        [ 8.8282e-43, -1.7405e+24,  8.8282e-43],
        [-1.7405e+24,  8.8282e-43, -1.7405e+24]])

## Construct a a randomly initialized $5\times 3$:

In [146]:
x = torch.rand(5, 3)

In [147]:
x

tensor([[0.7111, 0.2253, 0.3121],
        [0.2034, 0.1472, 0.4908],
        [0.0198, 0.3489, 0.1472],
        [0.5495, 0.9788, 0.9325],
        [0.2847, 0.0625, 0.9936]])

## Construct a zero matrix and of dtype long:

In [148]:
x = torch.zeros(5, 3, dtype=torch.long)

In [149]:
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

## Construct a tensor from data (a 3-d tensor):

In [150]:
x = torch.tensor([[[5.5, 3], [-3.8, 100]], [[1, 30], [023.8, 10]]])

In [151]:
x

tensor([[[  5.5000,   3.0000],
         [ -3.8000, 100.0000]],

        [[  1.0000,  30.0000],
         [ 23.8000,  10.0000]]])

## The size of a tensor

In [152]:
x.size()

torch.Size([2, 2, 2])

## Construct a tensor based on an existing tensor. 
###  Reusing the properties of the input tensor, e.g. dtype, unless new values are provided by user

In [153]:
x = torch.randn_like(x, dtype=torch.float) 

In [154]:
x

tensor([[[-1.0354, -0.6432],
         [-0.0378, -0.1270]],

        [[ 1.0058,  1.3536],
         [-1.0403,  0.3718]]])

# Getting started with Operations

## Adding to two tensors

In [155]:
x = torch.randn(5,3)
y = torch.ones(5,3)
a = x + y
a

tensor([[ 0.2444,  1.8104,  2.0608],
        [ 1.0121,  1.1090,  0.4099],
        [-0.5658,  2.5439,  1.0580],
        [ 1.9664,  1.1493,  1.6734],
        [-0.5262,  1.0086,  1.8097]])

## item() to get the value as a Python number

In [156]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([0.9974])
0.997427225112915


## Converting a Torch Tensor to a NumPy array and vice versa
https://pytorch.org/docs/stable/notes/cuda.html

### From Tensor to Numpy

In [157]:
a = torch.ones(5)
x = a.numpy()

In [158]:
a

tensor([1., 1., 1., 1., 1.])

In [159]:
x

array([1., 1., 1., 1., 1.], dtype=float32)

### From Numpy to Tensor

In [160]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
a

array([1., 1., 1., 1., 1.])

In [161]:
b

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

## Matrix inner product (similar to np.dot() )

In [162]:
a = torch.randn(2,3)
b = torch.randn(3,3)
torch.mm(a,b)

tensor([[-0.6189,  1.0974, -4.3025],
        [ 0.9510, -1.4415, -3.0914]])

## Removing dimention with size 1 from a Tensor

In [163]:
a = torch.rand(5,2,1)
print(a.size())
b = torch.squeeze(a, dim= 2)
print(b.size())

torch.Size([5, 2, 1])
torch.Size([5, 2])


## Transfering Tensors to different devices, e.g., GPU, CPU using the .to method.
### To run on GPU, just cast tensors to a cuda data type!

### First check if the CUDA interface exists on your system

In [164]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    dtype = torch.cuda.FloatTensor         # casting tensors to a cuda data type
else:
    device = torch.device("cpu")

### Directly create a tensor on GPU, or use strings ``.to("cuda")``

In [165]:
y = torch.ones_like(b, device=device, dtype=torch.float)  
x = torch.randn(y.size()).to('cuda') # or .to(y.device)

In [166]:
z = x + y
print(z)

tensor([[ 1.4831,  1.4877],
        [-0.3847, -0.3739],
        [ 0.9239,  0.2595],
        [ 0.2597,  1.5604],
        [ 0.9537,  2.0322]], device='cuda:0')


## AUTOGRAD: AUTOMATIC DIFFERENTIATION
https://pytorch.org/docs/stable/autograd.html

* ### Central to all neural networks in PyTorch is the autograd package
* ### Every Tensor has an attribute called __.requires_grad__ which can be True or False
* ### When it is True, starts to track all operations on it
* ### When finishing the computation, one can call __.backward()__ and have all the gradients computed automatically
* ### The gradient for the tensors will be accumulated into __.grad__ attribute

### __To stop a tensor from tracking history:__

* ## First approach:
###    - calling .detach() to detach it from the computation history
###    - It also prevents future computation from being tracked

* ## Second  approach:
###   - wrap the code block using __with torch.no_grad()__ 
###   - Deactivating computation of the gradient for trainable parameters with __requires_grad=True__

## Each tensor has a .grad_fn attribute that references a Function that has created the Tensor 
(grad_fn is None for Tensors created by the user)

In [167]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

### Operating a function on x makes the resulting tensor have __grad_fn__ attribute

In [168]:
y = x + 2
y

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

### __requires_grad_(.)__ changes an existing Tensorâ€™s requires_grad flag in-place. 
### The input flag defaults to False if not given

In [169]:
x = torch.arange(4).float().view(2,2)
print(x)
print(x.requires_grad)

tensor([[0., 1.],
        [2., 3.]])
False


In [170]:
x.requires_grad_(True)
print(x.requires_grad)

True


In [171]:
y = (x * x).sum()
print(b)
print(b.grad_fn)
print(b.requires_grad)

tensor([[0.4430, 0.9717],
        [0.4099, 0.6710],
        [0.9164, 0.6877],
        [0.0308, 0.3538],
        [0.9659, 0.4145]])
None
False


In [172]:
if x.grad is not None:
    x.grad.zero_() # comment this line will accumulate gradients
z = x + 2
print(z)
out = (x * x).sum()
out.backward()
print(out)
print(x.grad)

tensor([[2., 3.],
        [4., 5.]], grad_fn=<AddBackward0>)
tensor(14., grad_fn=<SumBackward0>)
tensor([[0., 2.],
        [4., 6.]])


In [173]:
# to explicitly compute gradients
z = x + 2
print(z)
out = (x * x).sum()
x_grad = torch.autograd.grad(out,x)
print(x_grad)

tensor([[2., 3.],
        [4., 5.]], grad_fn=<AddBackward0>)
(tensor([[0., 2.],
        [4., 6.]]),)
