In [1]:
import torch

In [2]:
torch.empty((1,2),dtype=torch.double)
a=torch.ones((4,50))
a.dtype

torch.float32

In [3]:
b=torch.tensor([2,3,3,2])
c=torch.tensor([1])
b
c.item()


1

In [9]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 2, out=a)
print(a)
print(b)

[3. 3. 3. 3. 3.]
tensor([3., 3., 3., 3., 3.], dtype=torch.float64)


In [10]:
b.new_ones(1)
b.size()

torch.Size([5])

### AUTOGRAD: AUTOMATIC DIFFERENTIATION
Central to all neural networks in PyTorch is the ```autograd``` package. Let’s first briefly visit this, and we will then go to training our first neural network.

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

+ ### Note 

>Our torch.Tensor constructor is overloaded to do the same thing as both ```torch.tensor``` and ```torch.empty```. We thought this overload would make code confusing, so we split ```torch.Tensor``` into ```torch.tensor``` and ```torch.empty```.

So @yxchng yes, to some extent, ```torch.tensor``` works similarly to ```torch.Tensor``` (when you pass in data). @ProGamerGov no, neither should be more efficient than the other. It’s just that the ```torch.empty``` and ```torch.tensor``` have a nicer API than our legacy ```torch.Tensor``` constructor.

In [11]:
a=torch.Tensor(1,2) #similar to torch.empty()
print(a)
a.dtype

tensor([[1.4013e-45, 0.0000e+00]])


torch.float32

In [12]:
b=torch.tensor([1,2]) # its convert list to tensor
print(b)
b.dtype

tensor([1, 2])


torch.int64

## Tensor
**```torch.Tensor``` is the central class of the package.** If you set its attribute ```.requires_grad``` as ```True```, it starts to track all operations on it. When you finish your computation you can call ```.backward()``` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into ```.grad``` attribute.

To stop a tensor from tracking history, you can call ```.detach()``` to detach it from the computation history, and to prevent future computation from being tracked.

>**Important Note Only Tensors of floating point dtype can require gradients**

In [35]:
x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad_fn) ##no function is used here so return None

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None


In [36]:
y = x+2
print(y)
print(y.grad_fn) # add function use here 

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x0000001044F8EF28>


In [37]:
z = y * y*3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


In [38]:
print(x.requires_grad)
a=x.detach()
a.requires_grad 

True


False

+ **IMPORTANT -: grad can be implicitly created only for scalar outputs**

for this problem error occure at ```z.backword()``` and ```y.backward()``` because they are not **_scalar_**

In [39]:
out.backward()


In [40]:
print(x.grad) ## darivative of mean(y*y*3 where y=x+2 and x=torch.ones(2,2)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [71]:
help(torch.ones)

Help on built-in function ones:

ones(...)
    ones(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    
    Returns a tensor filled with the scalar value `1`, with the shape defined
    by the variable argument :attr:`sizes`.
    
    Args:
        sizes (int...): a sequence of integers defining the shape of the output tensor.
            Can be a variable number of arguments or a collection like a list or tuple.
        out (Tensor, optional): the output tensor
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            Default: if ``None``, uses a global default (see :func:`torch.set_default_tensor_type`).
        layout (:class:`torch.layout`, optional): the desired layout of returned Tensor.
            Default: ``torch.strided``.
        device (:class:`torch.device`, optional): the desired device of returned tensor.
            Default: if ``None``, uses the current device for the default t

```.requires_grad_( ... )``` changes an existing Tensor’s ```requires_grad``` flag in-place. The input flag defaults to ```False``` if not given.

In [41]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x0000001044F92470>


## Gradients
Let’s backprop now. Because out contains a single scalar, ```out.backward()``` is equivalent to ```out.backward(torch.tensor(1.))```.

In [42]:
x=torch.tensor(3.0,requires_grad=True)
y=x+2*x
print(y)
y.backward()
print(x.grad)


tensor(9., grad_fn=<AddBackward0>)
tensor(3.)


In [43]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000: ## y.data.norm() its save past value
    y = y * 2

print(y)

tensor([ 562.7458,  695.3374, -876.3957], grad_fn=<MulBackward0>)


Now in this case y is no longer a scalar. ```torch.autograd``` could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to ```backward``` as argument:


In [44]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


In [45]:
x = torch.tensor([1,2,3], dtype=torch.float ,requires_grad=True)

y = x **2+2*x-19*x

In [46]:
print(y.grad_fn)


<SubBackward0 object at 0x0000001044F925C0>


In [47]:
v = torch.ones(x.size(), dtype=torch.float)
y.backward(v)

In [48]:
print(x.grad)

tensor([-15., -13., -11.])


You can also stop autograd from tracking history on Tensors with ```.requires_grad=True``` by wrapping the code block in with ```torch.no_grad():```

In [49]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


# _NEURAL NETWORKS_
Neural networks can be constructed using the ```torch.nn``` package.

Now that you had a glimpse of ```autograd```, ```nn``` depends on ```autograd``` to define models and differentiate them. An ```nn.Module``` contains layers, and a method ```forward(input)```that returns the output.

For example, look at this network that classifies digit images:
<img src="mnist.png" />

A typical training procedure for a neural network is as follows:

+ Define the neural network that has some learnable parameters (or weights)
+ Iterate over a dataset of inputs
+ Process input through the network
+ Compute the loss (how far is the output from being correct)
+ Propagate gradients back into the network’s parameters
+ Update the weights of the network, typically using a simple update rule: ```weight = weight - learning_rate * gradient```

## Define the network
Let’s define this network:

##### Base class for all neural network modules.

    Your models should also subclass this class.

    Modules can also contain other Modules, allowing to nest them in
    a tree structure. You can assign the submodules as regular attributes::

        import torch.nn as nn
        import torch.nn.functional as F

        class Model(nn.Module):
            def __init__(self):
                super(Model, self).__init__()
                self.conv1 = nn.Conv2d(1, 20, 5)
                self.conv2 = nn.Conv2d(20, 20, 5)

            def forward(self, x):
               x = F.relu(self.conv1(x))
               return F.relu(self.conv2(x))

    Submodules assigned in this way will be registered, and will have their
    parameters converted too when you call :meth:`to`, etc.

In [124]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(3, 6, 3) #Applies a 2D convolution over an input signal composed of several input planes.
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)# we can't cahnge input because it's defence of past output
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

Net(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [114]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


In [115]:
print(len(params))
for i in range(len(params)):
    print(params[i].size())

10
torch.Size([6, 1, 3, 3])
torch.Size([6])
torch.Size([16, 6, 3, 3])
torch.Size([16])
torch.Size([120, 576])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])


Let try a random ```32x32``` input. Note: expected input size of this net (LeNet) is 32x32. To use this net on MNIST dataset, please resize the images from the dataset to 32x32.

+ **how to find function for random input size?** 

In [126]:
input= torch.randn(1, 3, 32, 32)
print(input.size())
out = net(input)
print(out)

torch.Size([1, 3, 32, 32])
tensor([[-0.0821, -0.1117,  0.0056, -0.0974, -0.1437, -0.0574,  0.1433, -0.0515,
         -0.0215, -0.0584]], grad_fn=<AddmmBackward>)


In [117]:
class Model(nn.Module):
    
    def __init__(self):
        super(Model, self).__init__()
        self.conv1=nn.Conv2d(3,32,5)
        self.pool1=nn.MaxPool2d(3,3)
        self.conv2=nn.Conv2d(32,64,5)
        self.pool2=nn.MaxPool2d(3,3)
        self.conv3=nn.Conv2d(64,128,5)
        self.pool3=nn.MaxPool2d(3,3)
        self.fc1=nn.Linear(1024,7)
        
    def forward(self,x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.pool2(F.relu(self.conv3(x)))
        x = x.view(-1, 1024)
        x = F.relu(self.fc1(x))
        return x
        
mynet=Model()      

In [118]:
print(mynet)

Model(
  (conv1): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1))
  (pool1): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  (pool2): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(64, 128, kernel_size=(5, 5), stride=(1, 1))
  (pool3): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=1024, out_features=7, bias=True)
)


In [119]:
params1 = list(mynet.parameters())
print(len(params1))
print(params1[0].size())  # conv1's .weight

8
torch.Size([32, 3, 5, 5])


In [120]:
print(len(params1))
for i in range(len(params1)):
    print(params1[i].size())

8
torch.Size([32, 3, 5, 5])
torch.Size([32])
torch.Size([64, 32, 5, 5])
torch.Size([64])
torch.Size([128, 64, 5, 5])
torch.Size([128])
torch.Size([7, 1024])
torch.Size([7])


```torch.nn``` only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, ```nn.Conv2d``` will take in a 4D Tensor of ```nSamples x nChannels x Height x Width```.

If you have a single sample, just use ```input.unsqueeze(0)``` to add a fake batch dimension.

Before proceeding further, let’s recap all the classes you’ve seen so far.

###### Recap:
+ ```torch.Tensor``` - A multi-dimensional array with support for autograd operations like ```backward()```. Also holds the gradient w.r.t. the tensor.
+ ```nn.Module``` - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
+ ```nn.Parameter``` - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
+ ```autograd.Function``` - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.

## Loss Function
**A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.**

There are several different <a href='https://pytorch.org/docs/stable/nn.html'>loss functions</a> under the ```nn``` package . A simple loss is: ```nn.MSELoss``` which computes the mean-squared error between the input and the target.

For example:

In [144]:
output = net(input)
print(input.size())
print(output.size())
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
print(target.size())
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

torch.Size([1, 3, 32, 32])
torch.Size([1, 10])
torch.Size([1, 10])
tensor(0.2938, grad_fn=<MseLossBackward>)


In [145]:
print(loss.grad_fn)

<MseLossBackward object at 0x0000001045DA2898>


In [146]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
print(loss.grad_fn.next_functions[0][0].next_functions[1][0])  # ReLU

<MseLossBackward object at 0x0000001045C21C18>
<AddmmBackward object at 0x0000001045DA2128>
<AccumulateGrad object at 0x0000001045DA2358>
<ReluBackward0 object at 0x0000001045DA2128>


### Backprop
To backpropagate the error all we have to do is to ```loss.backward()```. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call ```loss.backward()```, and have a look at conv1’s bias gradients before and after the backward.

In [147]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0019, -0.0028,  0.0012, -0.0047,  0.0059,  0.0049])


### Update the weights
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

```weight = weight - learning_rate * gradient```


We can implement this using simple python code:

In [150]:
learning_rate = 0.01
for f in net.parameters():
    print("before",f)
    f.data.sub_(f.grad.data * learning_rate)
    print("after",f)

before Parameter containing:
tensor([[[[-0.1795,  0.1671, -0.0335],
          [-0.1615,  0.0440, -0.0687],
          [ 0.1038, -0.0299,  0.1830]],

         [[-0.1211, -0.1595,  0.0300],
          [-0.0855, -0.1143,  0.0670],
          [-0.1743,  0.0134,  0.0508]],

         [[ 0.0591,  0.0806,  0.0808],
          [ 0.1592,  0.0263, -0.1577],
          [ 0.1801, -0.1753,  0.0699]]],


        [[[ 0.0851,  0.0206,  0.0950],
          [ 0.1856, -0.0009, -0.1206],
          [ 0.0598, -0.1784,  0.0076]],

         [[-0.0934, -0.1366, -0.1231],
          [ 0.1597,  0.0266, -0.1067],
          [-0.0471,  0.0119, -0.0034]],

         [[ 0.0839, -0.1834, -0.0896],
          [ 0.1347,  0.1373,  0.1259],
          [ 0.0038,  0.0637,  0.1184]]],


        [[[ 0.0725,  0.1482, -0.1079],
          [ 0.1368,  0.1459, -0.0400],
          [ 0.0528,  0.0672,  0.0128]],

         [[-0.1850,  0.1906,  0.0801],
          [ 0.0986, -0.1383, -0.1513],
          [ 0.1001, -0.0854,  0.1627]],

         [[-0.0

          [ 0.1043, -0.1267,  0.1022]]]], requires_grad=True)
after Parameter containing:
tensor([[[[-0.0551, -0.0107, -0.0201],
          [ 0.1325, -0.0647, -0.0104],
          [-0.0324,  0.0699,  0.0174]],

         [[ 0.0037,  0.0720, -0.0699],
          [-0.0332,  0.0648, -0.0979],
          [ 0.1019, -0.0140,  0.0129]],

         [[ 0.0032, -0.0138, -0.0610],
          [ 0.0222, -0.1320, -0.0412],
          [-0.1174, -0.0175, -0.0659]],

         [[ 0.0744,  0.0351,  0.0096],
          [-0.1138,  0.0353,  0.0931],
          [ 0.0489, -0.0581, -0.0654]],

         [[-0.1357,  0.0610,  0.0415],
          [ 0.0988, -0.0274, -0.0571],
          [ 0.0121,  0.0051, -0.0400]],

         [[-0.1074, -0.0150,  0.0380],
          [-0.0212, -0.1051,  0.1030],
          [ 0.0268, -0.0362,  0.0681]]],


        [[[-0.0788, -0.0654,  0.1232],
          [ 0.1329, -0.0598,  0.0565],
          [-0.1005, -0.0167, -0.0329]],

         [[-0.0677, -0.1262,  0.0469],
          [-0.0848, -0.0422, -0.0672

          [ 0.1043, -0.1267,  0.1022]]]], requires_grad=True)
before Parameter containing:
tensor([-0.1277, -0.0906,  0.0798,  0.0852, -0.0608, -0.0266, -0.0492, -0.0152,
         0.0487, -0.1313, -0.0628,  0.0072, -0.0134,  0.0911,  0.1181,  0.0550],
       requires_grad=True)
after Parameter containing:
tensor([-0.1277, -0.0906,  0.0798,  0.0852, -0.0608, -0.0266, -0.0492, -0.0153,
         0.0487, -0.1313, -0.0628,  0.0072, -0.0133,  0.0911,  0.1181,  0.0550],
       requires_grad=True)
before Parameter containing:
tensor([[-0.0242, -0.0018, -0.0076,  ...,  0.0186,  0.0158, -0.0014],
        [ 0.0272,  0.0129, -0.0289,  ..., -0.0406,  0.0299,  0.0018],
        [ 0.0031,  0.0121, -0.0375,  ...,  0.0046,  0.0412, -0.0196],
        ...,
        [-0.0252, -0.0026,  0.0082,  ...,  0.0354, -0.0300, -0.0178],
        [ 0.0152, -0.0228,  0.0307,  ..., -0.0297,  0.0330, -0.0232],
        [ 0.0006, -0.0185, -0.0367,  ...,  0.0009, -0.0176, -0.0022]],
       requires_grad=True)
after Parameter

       requires_grad=True)
after Parameter containing:
tensor([[ 2.0953e-02,  5.4121e-02,  5.9873e-02, -9.0797e-02, -6.6533e-03,
         -9.8816e-02,  8.4101e-03, -3.9009e-02, -3.4080e-02,  1.0481e-01,
          7.7714e-02, -2.4384e-02,  5.6395e-02,  7.5572e-03,  7.9388e-02,
          1.0860e-01,  4.7899e-02,  1.0073e-01,  5.7370e-02, -5.3392e-02,
          2.8453e-02, -1.0658e-01, -6.1918e-02, -4.0296e-02, -6.2328e-02,
          1.8533e-02, -7.3682e-02, -6.8913e-02, -3.7547e-02,  7.9958e-02,
         -9.6199e-02,  3.7411e-02, -8.7319e-03,  4.8241e-03,  4.1737e-02,
         -8.5574e-02, -4.6442e-02,  1.0874e-01,  4.8666e-02, -3.7064e-02,
         -3.9428e-02, -9.8783e-02,  9.7269e-02,  1.0070e-01,  5.6409e-03,
          7.6701e-02, -7.7196e-02,  9.3929e-02, -9.9091e-02,  8.2097e-02,
          4.5548e-02,  5.0606e-02, -6.1575e-04,  4.0498e-03, -7.7022e-02,
          5.5953e-02,  7.9834e-02,  6.2282e-02,  6.9019e-02, -2.8734e-03,
          8.6267e-02,  1.1681e-02, -4.2713e-02,  7.8421e-

       requires_grad=True)
before Parameter containing:
tensor([-0.1040, -0.0670, -0.0107, -0.0175, -0.1063,  0.0668,  0.0749, -0.0290,
        -0.0521,  0.0009], requires_grad=True)
after Parameter containing:
tensor([-0.1037, -0.0681, -0.0088, -0.0170, -0.1066,  0.0672,  0.0730, -0.0282,
        -0.0512, -0.0004], requires_grad=True)


However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: ```torch.optim``` that implements all these methods. Using it is very simple:

In [151]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

**Note** -: Observe how gradient buffers had to be manually set to zero using ```optimizer.zero_grad()```. This is because gradients are accumulated as explained in Backprop section.