# Intro

[PyTorch](https://pytorch.org/) is a very powerful machine learning framework. Central to PyTorch are [tensors](https://pytorch.org/docs/stable/tensors.html), a generalization of matrices to higher ranks. One intuitive example of a tensor is an image with three color channels: A 3-channel (red, green, blue) image which is 64 pixels wide and 64 pixels tall is a $3\times64\times64$ tensor. You can access the PyTorch framework by writing `import torch` near the top of your code, along with all of your other import statements.

This guide will help introduce you to the functionality of PyTorch, but don't worry too much about memorizing it: the assignments will link to relevant documentation where necessary.

In [1]:
import torch

In [2]:
print(torch.__version__)

1.3.1


# Why PyTorch?

One important question worth asking is, why is PyTorch being used for this course? There is a great breakdown by [the Gradient](https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/) looking at the state of machine learning frameworks today. In part, as highlighted by the article, PyTorch is generally more pythonic than alternative frameworks, easier to debug, and is the most-used language in machine learning research by a large and growing margin. While PyTorch's primary alternative, Tensorflow, has attempted to integrate many of PyTorch's features, Tensorflow's implementations come with some inherent limitations highlighted in the article.

Notably, while PyTorch's industry usage has grown, Tensorflow is still (for now) a slight favorite in industry. In practice, the features that make PyTorch attractive for research also make it attractive for education, and the general trend of machine learning research and practice to PyTorch makes it the more proactive choice. 

# Tensor Properties
One way to create tensors from a list or an array is to use `torch.Tensor`. It'll be used to set up examples in this notebook, but you'll never need to use it in the course - in fact, if you find yourself needing it, that's probably not the correct answer. 

In [3]:
example_tensor = torch.Tensor(
    [
     [[1, 2], [3, 4]], 
     [[5, 6], [7, 8]], 
     [[9, 0], [1, 2]]
    ]
)

You can view the tensor in the notebook by simple printing it out (though some larger tensors will be cut off)

In [4]:
example_tensor

tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]],

        [[9., 0.],
         [1., 2.]]])

## Tensor Properties: Device

One important property is the device of the tensor - throughout this notebook you'll be sticking to tensors which are on the CPU. However, throughout the course you'll also be using tensors on GPU (that is, a graphics card which will be provided for you to use for the course). To view the device of the tensor, all you need to write is `example_tensor.device`. To move a tensor to a new device, you can write `new_tensor = example_tensor.to(device)` where device will be either `cpu` or `cuda`.

In [5]:
example_tensor.device

device(type='cpu')

## Tensor Properties: Shape

And you can get the number of elements in each dimension by printing out the tensor's shape, using `example_tensor.shape`, something you're likely familiar with if you've used numpy. For example, this tensor is a $3\times2\times2$ tensor, since it has 3 elements, each of which are $2\times2$. 

In [6]:
example_tensor.shape

torch.Size([3, 2, 2])

You can also get the size of a particular dimension $n$ using `example_tensor.shape[n]` or equivalently `example_tensor.size(n)`

In [7]:
print("shape[0] =", example_tensor.shape[0])
print("size(1) =", example_tensor.size(1))

shape[0] = 3
size(1) = 2


Finally, it is sometimes useful to get the number of dimensions (rank) or the number of elements, which you can do as follows

In [8]:
print("Rank =", len(example_tensor.shape))
print("Number of elements =", example_tensor.numel())

Rank = 3
Number of elements = 12


# Indexing Tensors

As with numpy, you can access specific elements or subsets of elements of a tensor. To access the $n$-th element, you can simply write `example_tensor[n]` - as with Python in general, these dimensions are 0-indexed. 

In [9]:
example_tensor[1]

tensor([[5., 6.],
        [7., 8.]])

In addition, if you want to access the $j$-th dimension of the $i$-th example, you can write `example_tensor[i, j]`

In [10]:
example_tensor[1, 1, 0]

tensor(7.)

Note that if you'd like to get a Python scalar value from a tensor, you can use `example_scalar.item()`

In [11]:
example_tensor[1, 1, 0].item()

7.0

In addition, you can index into the ith element of a column by using `x[:, i]`. For example, if you want the top-left element of each element in `example_tensor`, which is the `0, 0` element of each matrix, you can write:

In [12]:
example_tensor[:, 0, 0]

tensor([1., 5., 9.])

# Initializing Tensors

There are many ways to create new tensors in PyTorch, but in this course, the most important ones are: 

[`torch.ones_like`](https://pytorch.org/docs/master/generated/torch.ones_like.html): creates a tensor of all ones with the same shape and device as `example_tensor`.

In [13]:
torch.ones_like(example_tensor)

tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])

[`torch.zeros_like`](https://pytorch.org/docs/master/generated/torch.zeros_like.html): creates a tensor of all zeros with the same shape and device as `example_tensor`

In [14]:
torch.zeros_like(example_tensor)

tensor([[[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]]])

[`torch.randn_like`](https://pytorch.org/docs/stable/generated/torch.randn_like.html): creates a tensor with every element sampled from a [Normal (or Gaussian) distribution](https://en.wikipedia.org/wiki/Normal_distribution) with the same shape and device as `example_tensor`


In [15]:
torch.randn_like(example_tensor)

tensor([[[-1.1212, -0.9564],
         [ 1.0508,  0.7532]],

        [[ 1.9562, -0.5611],
         [ 0.8843,  0.0194]],

        [[-0.7667,  1.3042],
         [ 1.8819,  0.7251]]])

Sometimes (though less often than you'd expect), you might need to initialize a tensor knowing only the shape and device, without a tensor for reference for `ones_like` or `randn_like`. In this case, you can create a $2x2$ tensor as follows:

In [16]:
torch.randn(2, 2, device='cpu') # Alternatively, for a GPU tensor, you'd use device='cuda'

tensor([[ 0.3150, -0.6664],
        [-0.1546, -0.8158]])

# Basic Functions

There are a number of basic functions that you should know to use PyTorch - if you're familiar with numpy, all commonly-used functions exist in PyTorch, usually with the same name. You can perform element-wise multiplication / division by a scalar $c$ by simply writing `c * example_tensor`, and element-wise addition / subtraction by a scalar by writing `example_tensor + c`

Note that most operations are not in-place in PyTorch, which means that they don't change the original variable's data (However, you can reassign the same variable name to the changed data if you'd like, such as `example_tensor = example_tensor + 1`)

In [17]:
(example_tensor - 5) * 2

tensor([[[ -8.,  -6.],
         [ -4.,  -2.]],

        [[  0.,   2.],
         [  4.,   6.]],

        [[  8., -10.],
         [ -8.,  -6.]]])

You can calculate the mean or standard deviation of a tensor using [`example_tensor.mean()`](https://pytorch.org/docs/stable/generated/torch.mean.html) or [`example_tensor.std()`](https://pytorch.org/docs/stable/generated/torch.std.html). 

In [18]:
print("Mean:", example_tensor.mean())
print("Stdev:", example_tensor.std())

Mean: tensor(4.)
Stdev: tensor(2.9848)


You might also want to find the mean or standard deviation along a particular dimension. To do this you can simple pass the number corresponding to that dimension to the function. For example, if you want to get the average $2\times2$ matrix of the $3\times2\times2$ `example_tensor` you can write:

In [19]:
example_tensor.mean(0)

# Equivalently, you could also write:
# example_tensor.mean(dim=0)
# example_tensor.mean(axis=0)
# torch.mean(example_tensor, 0)
# torch.mean(example_tensor, dim=0)
# torch.mean(example_tensor, axis=0)

tensor([[5.0000, 2.6667],
        [3.6667, 4.6667]])

PyTorch has many other powerful functions but these should be all of PyTorch functions you need for this course outside of its neural network module (`torch.nn`).

# PyTorch Neural Network Module (`torch.nn`)

PyTorch has a lot of powerful classes in its `torch.nn` module (Usually, imported as simply `nn`). These classes allow you to create a new function which transforms a tensor in specific way, often retaining information when called multiple times.

In [20]:
import torch.nn as nn

## `nn.Linear`

To create a linear layer, you need to pass it the number of input dimensions and the number of output dimensions. The linear object initialized as `nn.Linear(10, 2)` will take in a $n\times10$ matrix and return an $n\times2$ matrix, where all $n$ elements have had the same linear transformation performed. For example, you can initialize a linear layer which performs the operation $Ax + b$, where $A$ and $b$ are initialized randomly when you generate the [`nn.Linear()`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) object. 

In [21]:
linear = nn.Linear(10, 2)
example_input = torch.randn(3, 10)
example_output = linear(example_input)
example_output

tensor([[-0.3066,  0.3297],
        [-0.0424,  0.1532],
        [ 0.0383, -0.1404]], grad_fn=<AddmmBackward>)

## `nn.ReLU`

[`nn.ReLU()`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) will create an object that, when receiving a tensor, will perform a ReLU activation function. This will be reviewed further in lecture, but in essence, a ReLU non-linearity sets all negative numbers in a tensor to zero. In general, the simplest neural networks are composed of series of linear transformations, each followed by activation functions. 

In [22]:
relu = nn.ReLU()
relu_output = relu(example_output)
relu_output

tensor([[0.0000, 0.3297],
        [0.0000, 0.1532],
        [0.0383, 0.0000]], grad_fn=<ReluBackward0>)

## `nn.BatchNorm1d`

[`nn.BatchNorm1d`](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html) is a normalization technique that will rescale a batch of $n$ inputs to have a consistent mean and standard deviation between batches.  

As indicated by the `1d` in its name, this is for situations where you expects a set of inputs, where each of them is a flat list of numbers. In other words, each input is a vector, not a matrix or higher-dimensional tensor. For a set of images, each of which is a higher-dimensional tensor, you'd use [`nn.BatchNorm2d`](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html), discussed later on this page.

`nn.BatchNorm1d` takes an argument of the number of input dimensions of each object in the batch (the size of each example vector).

In [23]:
batchnorm = nn.BatchNorm1d(2)
batchnorm_output = batchnorm(relu_output)
batchnorm_output

tensor([[-0.6965,  1.2522],
        [-0.6965, -0.0576],
        [ 1.3930, -1.1946]], grad_fn=<NativeBatchNormBackward>)

## `nn.Sequential`

[`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) creates a single operation that performs a sequence of operations. For example, you can write a neural network layer with a batch normalization as

In [24]:
mlp_layer = nn.Sequential(
    nn.Linear(5, 2),
    nn.BatchNorm1d(2),
    nn.ReLU()
)

test_example = torch.randn(5,5) + 1
print("input: ")
print(test_example)
print("output: ")
print(mlp_layer(test_example))

input: 
tensor([[ 3.0064,  1.2974,  0.0238,  1.1194,  1.2301],
        [ 1.2429,  2.0650,  1.5826,  3.0315,  0.4943],
        [ 2.0930,  0.9832,  0.6951, -0.8614,  0.9691],
        [ 1.6045,  1.7765, -0.0591,  0.6480, -0.9884],
        [ 1.1689,  1.2136,  2.5438,  2.2091,  2.1822]])
output: 
tensor([[0.8386, 0.2292],
        [0.0729, 0.8468],
        [0.0000, 0.0000],
        [0.0000, 0.0000],
        [1.1538, 1.2178]], grad_fn=<ReluBackward0>)


# Optimization

One of the most important aspects of essentially any machine learning framework is its automatic differentiation library. 

## Optimizers

To create an optimizer in PyTorch, you'll need to use the `torch.optim` module, often imported as `optim`. [`optim.Adam`](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam) corresponds to the Adam optimizer. To create an optimizer object, you'll need to pass it the parameters to be optimized and the learning rate, `lr`, as well as any other parameters specific to the optimizer.

For all `nn` objects, you can access their parameters as a list using their `parameters()` method, as follows:

In [25]:
import torch.optim as optim
adam_opt = optim.Adam(mlp_layer.parameters(), lr=1e-1)

## Training Loop

A (basic) training step in PyTorch consists of four basic parts:


1.   Set all of the gradients to zero using `opt.zero_grad()`
2.   Calculate the loss, `loss`
3.   Calculate the gradients with respect to the loss using `loss.backward()`
4.   Update the parameters being optimized using `opt.step()`

That might look like the following code (and you'll notice that if you run it several times, the loss goes down):


In [26]:
print(mlp_layer(train_example).mean().item())

NameError: name 'train_example' is not defined

In [27]:
train_example = torch.randn(100,5) + 1
adam_opt.zero_grad()

# We'll use a simple loss function of mean distance from 1
# torch.abs takes the absolute value of a tensor
cur_loss = torch.abs(1 - mlp_layer(train_example)).mean()

cur_loss.backward()
adam_opt.step()
print('loss= ',cur_loss.item())
print('mean pred= ', mlp_layer(train_example).mean().item())

loss=  0.7572625875473022
mean pred=  0.4149680733680725


In [28]:
bn2 = mlp_layer[1]
print(bn2.momentum)
print(bn2.running_mean)
print(bn2.running_var)
print(bn2.weight)
print(bn2.bias)

0.1
tensor([0.2491, 0.4900])
tensor([0.8042, 0.8648])
Parameter containing:
tensor([0.9000, 0.9000], requires_grad=True)
Parameter containing:
tensor([0.1000, 0.1000], requires_grad=True)


## `requires_grad_()`

You can also tell PyTorch that it needs to calculate the gradient with respect to a tensor that you created by saying `example_tensor.requires_grad_()`, which will change it in-place. This means that even if PyTorch wouldn't normally store a grad for that particular tensor, it will for that specified tensor. 

## `with torch.no_grad():`

PyTorch will usually calculate the gradients as it proceeds through a set of operations on tensors. This can often take up unnecessary computations and memory, especially if you're performing an evaluation. However, you can wrap a piece of code with `with torch.no_grad()` to prevent the gradients from being calculated in a piece of code. 


## `detach():`

Sometimes, you want to calculate and use a tensor's value without calculating its gradients. For example, if you have two models, A and B, and you want to directly optimize the parameters of A with respect to the output of B, without calculating the gradients through B, then you could feed the detached output of B to A. There are many reasons you might want to do this, including efficiency or cyclical dependencies (i.e. A depends on B depends on A).

# New `nn` Classes

You can also create new classes which extend the `nn` module. For these classes, all class attributes, as in `self.layer` or `self.param` will automatically treated as parameters if they are themselves `nn` objects or if they are tensors wrapped in `nn.Parameter` which are initialized with the class. 

The `__init__` function defines what will happen when the object is created. The first line of the init function of a class, for example, `WellNamedClass`, needs to be `super(WellNamedClass, self).__init__()`. 

The `forward` function defines what runs if you create that object `model` and pass it a tensor `x`, as in `model(x)`. If you choose the function signature, `(self, x)`, then each call of the forward function, gets two pieces of information: `self`, which is a reference to the object with which you can access all of its parameters, and `x`, which is the current tensor for which you'd like to return `y`.

One class might look like the following:

In [29]:
class ExampleModule(nn.Module):
    def __init__(self, input_dims, output_dims):
        super(ExampleModule, self).__init__()
        self.linear = nn.Linear(input_dims, output_dims)
        self.exponent = nn.Parameter(torch.tensor(1.))

    def forward(self, x):
        x = self.linear(x)

        # This is the notation for element-wise exponentiation, 
        # which matches python in general
        x = x ** self.exponent 
        
        return x

And you can view its parameters as follows

In [30]:
example_model = ExampleModule(10, 2)
list(example_model.parameters())

[Parameter containing:
 tensor(1., requires_grad=True),
 Parameter containing:
 tensor([[-0.2713,  0.2671, -0.1566,  0.3014,  0.0366,  0.0370,  0.2419,  0.2567,
          -0.0135, -0.2507],
         [ 0.1538, -0.0921, -0.0921,  0.0208, -0.1309,  0.0814, -0.1973, -0.1937,
           0.1678,  0.3116]], requires_grad=True),
 Parameter containing:
 tensor([-0.1461, -0.0995], requires_grad=True)]

And you can print out their names too, as follows:

In [31]:
list(example_model.named_parameters())

[('exponent',
  Parameter containing:
  tensor(1., requires_grad=True)),
 ('linear.weight',
  Parameter containing:
  tensor([[-0.2713,  0.2671, -0.1566,  0.3014,  0.0366,  0.0370,  0.2419,  0.2567,
           -0.0135, -0.2507],
          [ 0.1538, -0.0921, -0.0921,  0.0208, -0.1309,  0.0814, -0.1973, -0.1937,
            0.1678,  0.3116]], requires_grad=True)),
 ('linear.bias',
  Parameter containing:
  tensor([-0.1461, -0.0995], requires_grad=True))]

And here's an example of the class in action:

In [32]:
input = torch.randn(2, 10)
example_model(input)

tensor([[-0.2826,  0.1715],
        [-0.0986,  0.2645]], grad_fn=<PowBackward1>)

# 2D Operations

You won't need these for the first lesson, and the theory behind each of these will be reviewed more in later lectures, but here is a quick reference: 


*   2D convolutions: [`nn.Conv2d`](https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html) requires the number of input and output channels, as well as the kernel size.
*   2D transposed convolutions (aka deconvolutions): [`nn.ConvTranspose2d`](https://pytorch.org/docs/master/generated/torch.nn.ConvTranspose2d.html) also requires the number of input and output channels, as well as the kernel size
*   2D batch normalization: [`nn.BatchNorm2d`](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html) requires the number of input dimensions
*   Resizing images: [`nn.Upsample`](https://pytorch.org/docs/master/generated/torch.nn.Upsample.html) requires the final size or a scale factor. Alternatively, [`nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate) takes the same arguments. 





#  Practice

* import
* make tensor from list
    * check shape
    * check device
* indexing
* init ones_like, randn_like
* use in basic functions (add, mult, mean, std)

In [33]:
import torch

In [34]:
t_list = torch.tensor([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]],
    [[9, 10], [11, 12]]
])

In [35]:
t_list

tensor([[[ 1,  2],
         [ 3,  4]],

        [[ 5,  6],
         [ 7,  8]],

        [[ 9, 10],
         [11, 12]]])

In [36]:
t_list.shape

torch.Size([3, 2, 2])

In [37]:
t_list.device

device(type='cpu')

In [38]:
t_list[0,1,0]

tensor(3)

In [39]:
t_ones = torch.ones_like(t_list)
t_ones

tensor([[[1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1]]])

In [40]:
t_zeros = torch.zeros_like(t_list)
t_zeros

tensor([[[0, 0],
         [0, 0]],

        [[0, 0],
         [0, 0]],

        [[0, 0],
         [0, 0]]])

In [41]:
t_randint = torch.randint_like(t_list, low=0, high=99)
t_randint

tensor([[[ 7, 86],
         [83, 39]],

        [[39, 26],
         [61, 22]],

        [[38, 78],
         [10, 11]]])

In [42]:
t_list_float = t_list.float()
t_list_float

tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]],

        [[ 9., 10.],
         [11., 12.]]])

In [43]:
t_rand = torch.rand_like(t_list_float)
t_rand

tensor([[[0.9454, 0.0895],
         [0.5591, 0.8790]],

        [[0.0400, 0.0909],
         [0.9680, 0.3886]],

        [[0.7130, 0.0811],
         [0.6781, 0.5874]]])

In [44]:
t_randn = torch.randn_like(t_list_float)
t_randn

tensor([[[-0.7170, -0.5156],
         [-0.2708, -1.7794]],

        [[-0.8899, -1.8661],
         [-0.3738, -0.0905]],

        [[ 1.0038,  1.6794],
         [-0.2170, -0.3778]]])

In [45]:
t_list + t_list_float

tensor([[[ 2.,  4.],
         [ 6.,  8.]],

        [[10., 12.],
         [14., 16.]],

        [[18., 20.],
         [22., 24.]]])

In [46]:
t_list + t_zeros

tensor([[[ 1,  2],
         [ 3,  4]],

        [[ 5,  6],
         [ 7,  8]],

        [[ 9, 10],
         [11, 12]]])

In [47]:
t_list + t_ones

tensor([[[ 2,  3],
         [ 4,  5]],

        [[ 6,  7],
         [ 8,  9]],

        [[10, 11],
         [12, 13]]])

In [48]:
t_list_float + t_zeros

tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]],

        [[ 9., 10.],
         [11., 12.]]])

In [49]:
t_list_float + t_ones

tensor([[[ 2.,  3.],
         [ 4.,  5.]],

        [[ 6.,  7.],
         [ 8.,  9.]],

        [[10., 11.],
         [12., 13.]]])

In [50]:
t_list + t_rand

tensor([[[ 1.9454,  2.0895],
         [ 3.5591,  4.8790]],

        [[ 5.0400,  6.0909],
         [ 7.9680,  8.3886]],

        [[ 9.7130, 10.0811],
         [11.6781, 12.5874]]])

In [51]:
t_list + t_randint

tensor([[[ 8, 88],
         [86, 43]],

        [[44, 32],
         [68, 30]],

        [[47, 88],
         [21, 23]]])

In [52]:
t_list + t_randn

tensor([[[ 0.2830,  1.4844],
         [ 2.7292,  2.2206]],

        [[ 4.1101,  4.1339],
         [ 6.6262,  7.9095]],

        [[10.0038, 11.6794],
         [10.7830, 11.6222]]])

In [53]:
t_list * t_list_float

tensor([[[  1.,   4.],
         [  9.,  16.]],

        [[ 25.,  36.],
         [ 49.,  64.]],

        [[ 81., 100.],
         [121., 144.]]])

In [54]:
t_list * t_randn

tensor([[[ -0.7170,  -1.0312],
         [ -0.8124,  -7.1177]],

        [[ -4.4495, -11.1965],
         [ -2.6168,  -0.7243]],

        [[  9.0340,  16.7940],
         [ -2.3871,  -4.5332]]])

In [55]:
t_list **2

tensor([[[  1,   4],
         [  9,  16]],

        [[ 25,  36],
         [ 49,  64]],

        [[ 81, 100],
         [121, 144]]])

In [56]:
t_list_float **2

tensor([[[  1.,   4.],
         [  9.,  16.]],

        [[ 25.,  36.],
         [ 49.,  64.]],

        [[ 81., 100.],
         [121., 144.]]])

In [57]:
t_zeros **2

tensor([[[0, 0],
         [0, 0]],

        [[0, 0],
         [0, 0]],

        [[0, 0],
         [0, 0]]])

In [58]:
t_ones **2

tensor([[[1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1]]])

In [59]:
(t_ones+2) **2

tensor([[[9, 9],
         [9, 9]],

        [[9, 9],
         [9, 9]],

        [[9, 9],
         [9, 9]]])

In [60]:
(t_randn+2) **2

tensor([[[ 1.6462,  2.2035],
         [ 2.9902,  0.0487]],

        [[ 1.2323,  0.0179],
         [ 2.6444,  3.6460]],

        [[ 9.0227, 13.5380],
         [ 3.1791,  2.6316]]])

In [61]:
(t_randn) **2

tensor([[[0.5141, 0.2658],
         [0.0733, 3.1663]],

        [[0.7919, 3.4823],
         [0.1397, 0.0082]],

        [[1.0076, 2.8204],
         [0.0471, 0.1427]]])

In [62]:
t_list.float().mean()

tensor(6.5000)

In [63]:
t_list_float.mean()

tensor(6.5000)

In [64]:
t_zeros.float().mean()

tensor(0.)

In [65]:
t_ones.float().mean()

tensor(1.)

In [66]:
t_rand.mean()

tensor(0.5017)

In [67]:
t_randint.float().mean()

tensor(41.6667)

In [68]:
t_randn.mean()

tensor(-0.3679)

In [69]:
t_list_float.std()

tensor(3.6056)

In [70]:
t_rand.std()

tensor(0.3548)

In [71]:
t_randn.std()

tensor(0.9925)

In [72]:
t_list_float

tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]],

        [[ 9., 10.],
         [11., 12.]]])

In [73]:
t_list_float.mean([0,2])

tensor([5.5000, 7.5000])

# Practice BatchNorm

In [74]:
bn1 = nn.BatchNorm1d(4)

In [75]:
inp = torch.cat((torch.randn((1000,2,2))*2 + 4, torch.randn((1000,2,2))*10 + 1000), dim=1)
inp.shape

torch.Size([1000, 4, 2])

In [76]:
print('mean= ',inp.mean())
print('std= ',inp.std())
print('var= ',inp.var())
print('mean[:,half]=', inp.mean(0))
print('std[:,half]=', inp.std(0))
print('var[:,half]=', inp.var(0))

mean=  tensor(501.9691)
std=  tensor(498.0547)
var=  tensor(248058.5000)
mean[:,half]= tensor([[   3.9793,    4.0487],
        [   3.9663,    3.9914],
        [1000.0323,  999.4709],
        [ 999.7770, 1000.4872]])
std[:,half]= tensor([[ 2.0153,  1.9064],
        [ 1.9644,  2.0318],
        [ 9.7743, 10.0047],
        [ 9.8067,  9.8677]])
var[:,half]= tensor([[  4.0614,   3.6343],
        [  3.8590,   4.1282],
        [ 95.5367, 100.0944],
        [ 96.1718,  97.3715]])


In [77]:
out_bn1 = bn1(inp)
print('out_bn1.size=', out_bn1.size())
print('mean= ',out_bn1.mean())
print('std= ',out_bn1.std())
print('mean[:,half]=', out_bn1.mean(0))
print('std[:,half]=', out_bn1.std(0))

out_bn1.size= torch.Size([1000, 4, 2])
mean=  tensor(3.0708e-07, grad_fn=<MeanBackward0>)
std=  tensor(1.0001, grad_fn=<StdBackward0>)
mean[:,half]= tensor([[-0.0177,  0.0177],
        [-0.0063,  0.0063],
        [ 0.0284, -0.0284],
        [-0.0361,  0.0361]], grad_fn=<MeanBackward1>)
std[:,half]= tensor([[1.0277, 0.9722],
        [0.9835, 1.0172],
        [0.9884, 1.0117],
        [0.9967, 1.0029]], grad_fn=<StdBackward1>)


In [78]:
bn1.momentum

0.1

In [79]:
bn1.running_mean

tensor([  0.4014,   0.3979,  99.9752, 100.0132])

In [80]:
bn1.running_var

tensor([ 1.2847,  1.2992, 10.6845, 10.5849])

In [81]:
bn1.weight

Parameter containing:
tensor([1., 1., 1., 1.], requires_grad=True)

In [82]:
bn1.bias

Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)

# Practise Sequential

In [83]:
seq2 = nn.Sequential(
    nn.Linear(4, 2),
    nn.BatchNorm1d(2),
    nn.ReLU()
)

In [84]:
inp_seq = torch.randn((100, 4))*5 + 10
test_seq = torch.randn((100, 4))*5 + 10

out_seq = seq2(inp_seq)
print('inp_seq.size:', inp_seq.size())
print('out_seq.size:', out_seq.size())

inp_seq.size: torch.Size([100, 4])
out_seq.size: torch.Size([100, 2])


In [85]:
opt_seq = torch.optim.Adam(seq2.parameters(), lr=1e-2)
criterion = torch.mean(torch.abs(1 - seq2(inp_seq)))

In [86]:
print('inp_seq.mean: ',inp_seq.mean(0))
print('inp_seq.std: ',inp_seq.std(0))
print()
print('criterion:', criterion.mean().item(), 'pred.mean: ',seq2(inp_seq).mean(0).tolist())

inp_seq.mean:  tensor([10.5579, 10.1710, 10.0206,  9.2985])
inp_seq.std:  tensor([5.6316, 4.9133, 4.7359, 4.5245])

criterion: 0.7662103772163391 pred.mean:  [0.3879261016845703, 0.3877398371696472]


In [87]:
for i in range(200):
    opt_seq.zero_grad()
    criterion = torch.mean(torch.abs(1 - seq2(inp_seq)))
    criterion.backward()
    opt_seq.step()
    print('criterion:', criterion.mean().item(), 'pred.mean: ',seq2(inp_seq).mean(0).tolist())

criterion: 0.7662103772163391 pred.mean:  [0.3922105133533478, 0.39067542552948]
criterion: 0.7589684128761292 pred.mean:  [0.3953389823436737, 0.39262640476226807]
criterion: 0.7521691918373108 pred.mean:  [0.399160236120224, 0.3942054510116577]
criterion: 0.7442622184753418 pred.mean:  [0.40493297576904297, 0.39621275663375854]
criterion: 0.7359932065010071 pred.mean:  [0.4099179208278656, 0.3980749845504761]
criterion: 0.7282418012619019 pred.mean:  [0.4138473868370056, 0.3995427191257477]
criterion: 0.7211025953292847 pred.mean:  [0.4176526367664337, 0.40096619725227356]
criterion: 0.7148005962371826 pred.mean:  [0.4211721122264862, 0.40232983231544495]
criterion: 0.7102056741714478 pred.mean:  [0.42448052763938904, 0.40393170714378357]
criterion: 0.7067551612854004 pred.mean:  [0.4274815320968628, 0.40607306361198425]
criterion: 0.7033290863037109 pred.mean:  [0.4301227927207947, 0.4088502824306488]
criterion: 0.6991634964942932 pred.mean:  [0.43250346183776855, 0.4121593236923218