# PyTorch Introduction
## PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
### Can be seen as substitute of NumPy with GPU capabilities
A Tensor is just a more generic term than matrix or vector.
PyTorch Tensors There appear to be 4 major types of tensors in PyTorch: Byte, Float, Double, and Long tensors. Each tensor type corresponds to the type of number (and more importantly the size/preision of the number) contained in each place of the matrix.


## NumPy:
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
* For Numpy docs: Refer the following https://docs.scipy.org/doc/numpy-1.13.0/reference/

In [0]:
import torch

## A torch.Tensor is a multi-dimensional matrix containing elements of a single data type.
A very similar package for `numpy.ndarray`. It basically supports almost every major computation of numpy

### Lets dive in and see some basic pythonic operations

In [2]:
x = torch.zeros(2,3)
x


 0  0  0
 0  0  0
[torch.FloatTensor of size 2x3]

In [3]:
x = torch.ones(2,3)
x


 1  1  1
 1  1  1
[torch.FloatTensor of size 2x3]

In [4]:
# torch.arange(start,end,step=1) -> [start,end) with step
x = torch.arange(0,3,step=0.5)
x



 0.0000
 0.5000
 1.0000
 1.5000
 2.0000
 2.5000
[torch.FloatTensor of size 6]

In [6]:
# torch.FloatTensor(size or list)
x = torch.FloatTensor(2,3)
x



1.00000e-36 *
  1.6396  0.0000  0.0000
  0.0000  0.0000  0.0000
[torch.FloatTensor of size 2x3]

## Convert NumPy to PyTorch and vice-versa
With almost no computation cost, you can convert PyTorch tensor to NumPy array and any change in the converted NumPy array will reflect on the original PyTorch tensor

In [7]:
import numpy as np

# torch.from_numpy(ndarray) -> tensor

x1 = np.ndarray(shape=(2,3), dtype=int,buffer=np.array([1,2,3,4,5,6]))
x2 = torch.from_numpy(x1)

x2


 1  2  3
 4  5  6
[torch.LongTensor of size 2x3]

In [8]:
# tensor.numpy() -> ndarray
x3 = x2.numpy()
x3

# Defining a numpy array and converting it to a Torch tensor
# a = np.ndarray(shape=(2,3), dtype=float)
# a = torch.from_numpy(a)
# a

array([[1, 2, 3],
       [4, 5, 6]])

In [9]:
x = torch.FloatTensor([[1,2,3],[4,5,6]])
x


 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]

In [0]:
# x_gpu = x.cuda()
# x_gpu

In [12]:
# tensor.size() -> indexing also possible

x = torch.FloatTensor(10,12,3,3)

x.size()[:]

# x.size()[:2]
# x.size()

torch.Size([10, 12, 3, 3])

## The contents of a tensor can be accessed and modified using Python’s indexing and slicing notation:

In [13]:
# torch.index_select(input, dim, index)

x = torch.rand(4,3)
out = torch.index_select(x,0,torch.LongTensor([0,3]))

x,out


(
  0.4400  0.8690  0.1604
  0.2888  0.4016  0.6627
  0.0776  0.9876  0.1926
  0.4662  0.3892  0.5683
 [torch.FloatTensor of size 4x3], 
  0.4400  0.8690  0.1604
  0.4662  0.3892  0.5683
 [torch.FloatTensor of size 2x3])

### Pythonic Indexing

In [14]:
# pythonic indexing also works

x[:,0],x[0,:],x[0:2,0:2]

# name = 'abhishek'
# name = name[0:2]
# name

(
  0.4400
  0.2888
  0.0776
  0.4662
 [torch.FloatTensor of size 4], 
  0.4400
  0.8690
  0.1604
 [torch.FloatTensor of size 3], 
  0.4400  0.8690
  0.2888  0.4016
 [torch.FloatTensor of size 2x2])

### Torch masking

In [15]:
# torch.masked_select(input, mask)

x = torch.randn(2,3)
mask = torch.ByteTensor([[0,0,1],[0,1,0]])
out = torch.masked_select(x,mask)

x, mask, out

(
  0.2176 -0.4504  1.0259
 -0.2393  0.7895 -0.3230
 [torch.FloatTensor of size 2x3], 
  0  0  1
  0  1  0
 [torch.ByteTensor of size 2x3], 
  1.0259
  0.7895
 [torch.FloatTensor of size 2])

In [16]:
# torch.cat(seq, dim=0) -> concatenate tensor along dim

x = torch.FloatTensor([[1,2,3],[4,5,6]])
y = torch.FloatTensor([[-1,-2,-3],[-4,-5,-6]])
z1 = torch.cat([x,y],dim=0)
z2 = torch.cat([x,y],dim=1)

x,y,z1,z2


(
  1  2  3
  4  5  6
 [torch.FloatTensor of size 2x3], 
 -1 -2 -3
 -4 -5 -6
 [torch.FloatTensor of size 2x3], 
  1  2  3
  4  5  6
 -1 -2 -3
 -4 -5 -6
 [torch.FloatTensor of size 4x3], 
  1  2  3 -1 -2 -3
  4  5  6 -4 -5 -6
 [torch.FloatTensor of size 2x6])

## Math Functions

### Torch provides MATLAB-like functions for manipulating Tensor objects. 

`torch.add(tensor, value)`
Add the given value to all elements in the `Tensor`.


`y = torch.add(x, value)` returns a new `Tensor`.

`x:add(value)` add `value` to all elements in place.

In [17]:
# torch.add()

x1 = torch.FloatTensor([[1,2,3],[4,5,6]])
x2 = torch.FloatTensor([[1,2,3],[4,5,6]])
add = torch.add(x1,x2)

x1,x2,add,x1+x2,x1-x2


(
  1  2  3
  4  5  6
 [torch.FloatTensor of size 2x3], 
  1  2  3
  4  5  6
 [torch.FloatTensor of size 2x3], 
   2   4   6
   8  10  12
 [torch.FloatTensor of size 2x3], 
   2   4   6
   8  10  12
 [torch.FloatTensor of size 2x3], 
  0  0  0
  0  0  0
 [torch.FloatTensor of size 2x3])

### Matrix matrix product of `mat1` and `mat2`. 
If mat1 is a n × m matrix, mat2 a m × p matrix, res must be a n × p matrix.

`torch.mm(x, y)` puts the result in a new Tensor.

`torch.mm(M, x, y)` puts the result in M.

`M:mm(x, y)` puts the result in M.

In [18]:
# torch.mm(mat1, mat2) -> matrix multiplication

x1 = torch.FloatTensor(3,4)
x2 = torch.FloatTensor(4,5)

torch.mm(x1,x2)


 0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00         nan
-1.4584e-23  2.5023e+36         inf  2.6521e+31         nan
 0.0000e+00 -2.0736e-28  0.0000e+00  0.0000e+00         nan
[torch.FloatTensor of size 3x5]

In [1]:
# torch.eig(a,eigenvectors=False) -> eigen_value, eigen_vector

import torch

x1 = torch.FloatTensor(4,4)

x1 = torch.eig(x1,True)

x1

(
  0.0000e+00  0.0000e+00
  1.0141e+31  0.0000e+00
  0.0000e+00  0.0000e+00
  0.0000e+00  0.0000e+00
 [torch.FloatTensor of size 4x2], 
 -1.0000 -0.2425  0.0000  0.0000
  0.0000  0.0000  1.0000 -1.0000
  0.0000 -0.9701  0.0000  0.0000
  0.0000  0.0000  0.0000  0.0000
 [torch.FloatTensor of size 4x4])

## PyTorch Autograd

In [0]:
from torch.autograd import Variable

### Autograd is now a core torch package for automatic differentiation. It uses a tape based system for automatic differentiation.

In autograd, there is a Variable class, which is a very thin wrapper around a Tensor. 
You can access the raw tensor through the `.data attribute`, and after computing the backward pass, a gradient w.r.t. this variable is accumulated into `.grad attribute`.

#### We wrap our PyTorch Tensors in Variable objects; a Variable represents a node in a computational graph. If x is a Variable then `x.data` is a Tensor, and `x.grad` is another Variable holding the gradient of x with respect to some scalar value.

In [0]:
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

Create random Tensors to hold input and outputs, and wrap them in Variables.
Setting `requires_grad=False` indicates that we do not need to compute gradients with respect to these Variables during the backward pass.


In [0]:
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

#### Create random Tensors for weights, and wrap them in Variables.
#### Setting requires_grad=True indicates that we want to compute gradients with respect to these Variables during the backward pass.

In [0]:
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

### Forward pass: 
compute predicted y using operations on Variables


### Use autograd:
to compute the backward pass. This call will compute the gradient of loss with respect to all Variables with requires_grad=True.

After this call w1.grad and w2.grad will be Variables holding the gradient of the loss with respect to w1 and w2 respectively.


### Update weights:
using gradient descent; w1.data and w2.data are Tensors, w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are Tensors.

#### Manually zero the gradients after running the backward pass

In [6]:
learning_rate = 1e-6
for t in range(500):
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # Compute and print loss using operations on Variables.
  # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
  # (1,); loss.data[0] is a scalar value holding the loss.
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.data[0])


  loss.backward()

  w1.data -= learning_rate * w1.grad.data
  w2.data -= learning_rate * w2.grad.data

  # Manually zero the gradients after running the backward pass
  w1.grad.data.zero_()
  w2.grad.data.zero_()

0 50867244.0
1 57149276.0
2 59574760.0
3 43956764.0
4 21238236.0
5 7760386.0
6 3358652.75
7 2050725.125
8 1531127.125
9 1229176.875
10 1013300.1875
11 846491.3125
12 713454.875
13 605582.8125
14 517306.40625
15 444503.15625
16 383980.75
17 333238.09375
18 290512.15625
19 254387.984375
20 223613.859375
21 197239.8125
22 174497.671875
23 154867.0625
24 137839.84375
25 123038.9296875
26 110122.6796875
27 98811.515625
28 88866.5546875
29 80097.0390625
30 72335.2265625
31 65446.3359375
32 59321.10546875
33 53864.28125
34 48992.8125
35 44632.23828125
36 40721.296875
37 37206.52734375
38 34041.65234375
39 31188.408203125
40 28610.720703125
41 26275.166015625
42 24158.037109375
43 22236.06640625
44 20487.91796875
45 18896.255859375
46 17446.169921875
47 16122.447265625
48 14914.0849609375
49 13808.0078125
50 12794.35546875
51 11864.44140625
52 11010.41796875
53 10224.904296875
54 9502.67578125
55 8837.1943359375
56 8223.6123046875
57 7657.494140625
58 7134.43212890625
59 6651.041015625
60 6203

# PyTorch NN module

The nn package defines a set of Modules, which you can think of as a neural network layer that has produces output from input and may have some trainable weights.



In [0]:
N, D_in, H, D_out = 64, 1000, 100, 10

### Create random Tensors to hold inputs and outputs, and wrap them in Variables.


In [0]:
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

### Use the nn package to define our model as a sequence of layers.
nn.Sequential is a Module which contains other Modules, and applies them in sequence to produce its output. 
Each Linear Module computes output from input using a linear function, and holds internal Variables for its weight and bias.


In [0]:
model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        )

### Loss Function:
The nn package also contains definitions of popular loss functions; in this case we will use Mean Squared Error (MSE) as our loss function.
We'll also initialise the learning rate


In [0]:
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4

## Optimization:
Use the optim package to define an Optimizer that will update the weights of the model for us. 

Here we will use Adam; the optim package contains many otheroptimization algoriths. 
The first argument to the Adam constructor tells the optimizer which Variables it should update.

In [0]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

### Forward pass: 
compute predicted y by passing x to the model. Module objects override the __call__ operator so you can call them like functions. 
When doing so you pass a Variable of input data to the Module and it produces a Variable of output data.

### Compute Loss

### Zero the gradients before running the backward pass.

### Backward pass: 
compute gradient of the loss with respect to all the learnable parameters of the model. Internally, the parameters of each Module are stored in Variables with requires_grad=True, so this call will compute gradients for all learnable parameters in the model.

### Update the weights using gradient descent. 

In [12]:
for t in range(500):
  y_pred = model(x)

  # Compute and print loss.
  loss = loss_fn(y_pred, y)
  print(t, loss.data[0])
  
  # Before the backward pass, use the optimizer object to zero all of the
  # gradients for the variables it will update (which are the learnable weights
  # of the model)
  optimizer.zero_grad()

  # Backward pass: compute gradient of the loss with respect to model parameters
  loss.backward()

  # Calling the step function on an Optimizer makes an update to its parameters
  optimizer.step()

0 667.74169921875
1 651.0390625
2 634.7952880859375
3 619.0105590820312
4 603.671875
5 588.7374267578125
6 574.2692260742188
7 560.21728515625
8 546.5621337890625
9 533.320556640625
10 520.5311279296875
11 508.0982666015625
12 496.0144348144531
13 484.30792236328125
14 472.955810546875
15 461.9398193359375
16 451.2313232421875
17 440.77545166015625
18 430.5967712402344
19 420.6881408691406
20 411.11181640625
21 401.78167724609375
22 392.6910400390625
23 383.80694580078125
24 375.1127624511719
25 366.630615234375
26 358.36395263671875
27 350.2791748046875
28 342.35455322265625
29 334.6158447265625
30 327.08428955078125
31 319.7550048828125
32 312.5661315917969
33 305.5269470214844
34 298.61810302734375
35 291.86126708984375
36 285.2330017089844
37 278.7474365234375
38 272.3753967285156
39 266.1274108886719
40 260.01263427734375
41 254.0327606201172
42 248.1527099609375
43 242.40402221679688
44 236.77120971679688
45 231.2448272705078
46 225.8367462158203
47 220.51226806640625
48 215.2927