# Assignment 1 - All about *torch.Tensor*

### Deep Learning with PyTorch: Zero to GANs

PyTorch is a popular deep learning framework developed and open-sourced by the Facebook, Inc. The framework is based on Torch library, originally developed by Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. Parts of PyTorch code is written in Python, and rest of the code is based on C++ and CUDA.

At present, the PyTorch is competing with Tensorflow which is the most widely used (also an open-sourced deep learning framework from the Google Inc.) framework among the Deep Learning community users based on the data from github database.

PyTorch is introduced as a dynamic framework which means the computations of tensor operations happen side by side with their definitions. Tensorflow was introduced as a static framework meaning all the required computations will be defined first, transformed into tensorflow computational graphs, and then get computed. Recently, Tensorflow also introduced Tensorflow 2, a dynamic framework similar to PyTorch.

PyTorch comprises powerful modules such as Autograd (provides auto-differentiation of tensors), Optim (for implementing optimization routines), and NN (for neural network library functions), and as well as provides Multiprocessing library and utility functions (Utils). 

In this notebook, we shall cover a set of few basic functions to familiarize ourselves with the PyTorch.

In [1]:
# Import torch and other required modules
import torch
import numpy as np

In [2]:
# Find out which version of Torch we are using
print(torch.version.__version__)
print(torch.__version__)

1.5.0
1.5.0


The class torch.tensor is the core class of the package. In this section, we cover a few functions that operates on Tensors. Below is a sample on how to create a tensor in PyTorch.

In [3]:
points = torch.tensor([[[1, 2., 5], [3, 4.0, 1], [-1.0, 2, 0]], [[0, 0, 0], [10, 4, 1], [0, 1, 2]]])

In [4]:
points

tensor([[[ 1.,  2.,  5.],
         [ 3.,  4.,  1.],
         [-1.,  2.,  0.]],

        [[ 0.,  0.,  0.],
         [10.,  4.,  1.],
         [ 0.,  1.,  2.]]])

## Function 1 - find if the given object is tensor

We may encounter different kinds of objects defined to suit specific domain oriented applications, and these could either be stored or used as is, during computations.

In [5]:
# Let's define two objects, a list data type x and a standard normal distributed random tensor y.
x = [1, 5, 8.3, 2] # x is a list
y = torch.randn(3, 3) # y is random tensor of standard normal distribution
z = torch.tensor([[1, 3], [2, 1]])

In [6]:
# Check if the object is a tensor using is_tensor() operand
torch.is_tensor(x)

False

In [7]:
# Check if the object is stored as a tensor object.
torch.is_storage(x)

False

In [8]:
# Check if this object y is a tensor and if it is stored -- using print() to do both in the same cell

print(torch.is_tensor(y))
print(torch.is_storage(y))

True
False


In the above examples, we see that the x is a list and not a tensor, and hence yields False for .is_tensor(). Also x is not stored but is readily available for computations.

Then we also defined z as a tensor using torch.tensor(). This also yields True for .is_tensor(). But the same gives False for .is_storage() which I could not understand. My understanding is that the tensors defined using torch.tensor() is stored somewhere, and hence .is_storage() should be True.

In [9]:
# Example 3 - breaking (to illustrate when it breaks)
torch.tensor([[1, 2], [3, 4, 5]])

ValueError: expected sequence of length 2 at dim 1 (got 3)

Tensors should be of same size / shape, and the above gives error because the length of the rows are not same i.e. column size is different.

## Function 2 - torch.stack()

Sometimes we encounter situations where we need to join two tensors together, and we may want to join these tensors with a specific intention. This also brings down the storage requirements from storing multiple same sized small tensors into storing one single tensor.

Say, we don't want to add them numerically, just stack them side by side, like the playing cards of 4 different suits. We use torch.stack() to stack them all in a new dimension (playing cards tuck box). The new dimension here resembles the tuck box here, and the shape of individual tensors should be same to carry-out the stack operation, i.e. (1, 13) in case of cards, each suit having 13 cards, and 4 suits are the 4 tensors that are being stacked by torch.stack() operation.

The other possibility of joining is to stack similar categories together while joining them, i.e. concatenation based on similarities. Like getting 2 decks (2 boxes of 52 cards) and joining them together but keeping the suits separate. This way, we will be getting 4 suits of 2 times 13 = 26 cards. Or 4 tensors, with each of them having a shape of (1, 26). In this case, we use torch.cat() to add them up in a specific dimension.

Yet another instance of joining would be to add them numerically, which is done by torch.add().

In [10]:
# Defining dummy tensors to work with the demo of examples

a = torch.randn(1, 3)
b = torch.randn(1, 3)

# We created two tensors of size / shape (1, 3) using random standard normal distribution.
# which means that their mean and std deviations are 0 and 1, respectively, for a and b.

print(a)
print('\n')
print(b)

tensor([[-0.2903,  0.0726, -1.5728]])


tensor([[ 0.0995, -1.4382,  0.3038]])


In [11]:
# Example 1 -- default setting parameter dim=0

c = torch.stack([a, b], dim=0)
print(c)
print(c.shape)

tensor([[[-0.2903,  0.0726, -1.5728]],

        [[ 0.0995, -1.4382,  0.3038]]])
torch.Size([2, 1, 3])


As seen above, the stacking of (1, 3) tensor with another (1, 3) tensor with dim=0, created the new dimension in the index position 0, and stacked each one in a row, thus giving a new tensor of shape (2, 1, 3).

In [12]:
# Example 2 - dimension parameter dim set to 1

d = torch.stack([a, b], dim=1)
print(d)
print(d.shape)

tensor([[[-0.2903,  0.0726, -1.5728],
         [ 0.0995, -1.4382,  0.3038]]])
torch.Size([1, 2, 3])


With the dim=1, we stacked the above tensors in the original row dimension. To understand this, lets say we have a single gray-scaled image represented as a tensor P, then we cut it into two halves horizontally, upper and lower halves, not necessarily equal in height but each of the halves will have the same width. Now lets say, we want to reproduce original P. We can use the torch.stack() with dim=1. This will add these cut halves again into a single image tensor of original size. 

In [13]:
# Just a quick-check for torch.add() function

a1 = torch.eye(3) # creates an identity matrix of size 3 x 3
b1 = torch.ones(3) # creates a ones tensor of size 3 x 3
torch.add(b1, -a1) # adds them numerically, note the minus sign

tensor([[0., 1., 1.],
        [1., 0., 1.],
        [1., 1., 0.]])

In [14]:
# Example 3 - breaking (to illustrate when torch.stack() breaks)

e = torch.stack([c, d], dim=0)

RuntimeError: stack expects each tensor to be equal size, but got [2, 1, 3] at entry 0 and [1, 2, 3] at entry 1

The stacking operation can be done only on tensors of same sizes. PyTorch offers another function torch.cat() to do the stacking operation in case of tensors of different sizes.

Another alternative solution in this case is to either use padding to make the short-comings of small sized tensor to match the size of the largest tensor used in the stacking operation, or unsqueeze() or reshape() functions before performing torch.stack() operation. 

In [15]:
f = c.reshape(1, 2, 3)
e = torch.stack([f, d], dim=0)
print(e)
print(e.shape)

tensor([[[[-0.2903,  0.0726, -1.5728],
          [ 0.0995, -1.4382,  0.3038]]],


        [[[-0.2903,  0.0726, -1.5728],
          [ 0.0995, -1.4382,  0.3038]]]])
torch.Size([2, 1, 2, 3])


## Function 3 - torch.cat()

As explained before, in order to join tensors of different sizes (concatenation), PyTorch function torch.cat() is used. The catch point is that the concatenation happens in the dimension that is different, and hence rest of the dimensions should be same.

For example, we have a batch containing 3 gray-scaled images i.e. a tensor of shape (3, 32, 32, 1), another batch containing 2 gray-scaled images i.e. a tensor of shape (2, 32, 32, 1). Lets see how torch.cat() can be used here.

In [16]:
# Example 1 - along dim=0 (default)
batch1 = torch.randn(3, 32, 32, 1)
batch2 = torch.randn(2, 32, 32, 1)

add_batches = torch.cat([batch1, batch2], dim=0)
print(add_batches.shape)

torch.Size([5, 32, 32, 1])


In the above, we just added two batches of images into a single set. Lets see what happens if we have one set containing images belonging to 3 categories of objects, and another set containing images belonging to 2 categories that are different from the previous set. We wanted to add these sets to create a batch, of size (1, 5, 32, 32, 1) i.e. 1 batch containing 5 categories of objects' gray-scaled 32x32 sized images.

In [17]:
# Example 2 - dim=1, the use case doesn't make any sense though.

set1 = torch.randn(1, 3, 32, 32, 1)
set2 = torch.randn(1, 2, 32, 32, 1)

# Lets concatenate in the dim=1 dimension
batch = torch.cat([set1, set2], dim=1)
print(batch.shape)

torch.Size([1, 5, 32, 32, 1])


The above example is only demonstrating the use of torch.cat(). In the real-case scenarios, we don't carry the category information with the images. Either they appear as a target variable or unknown.

In [18]:
# Example 3 - Only one dimension can be different i.e. the dimension
# where the concatenation happens; rest of the dimensions should be matching.

set3 = torch.randn(100, 32, 32, 1)
set4 = torch.randn(100, 32, 32, 3)
set5 = torch.randn(100, 64, 64, 1)
set6 = torch.randn(50, 32, 32, 3)

batch1 = torch.cat([set3, set6], dim=3)
print(batch1.shape)

RuntimeError: Sizes of tensors must match except in dimension 3. Got 100 and 50 in dimension 0

In [19]:
batch2 = torch.cat([set3, set5], dim=1)
print(batch2.shape)

RuntimeError: Sizes of tensors must match except in dimension 1. Got 32 and 64 in dimension 2

The above produces error due to unmatching sizes other than the dimension where concatenation is specified using the parameter 'dim'.

## Function 4 - Convert numpy array to tensor and vice versa

In this section, the functions used to convert numpy arrays to a tensor as well as to convert a tensor to a numpy array are shown with examples. These are simple to use but very useful functions.

In [20]:
# Example 1 - Numpy array to a Tensor

n1 = np.array([[1, 3, 5], [-2, 1, 2], [0, 2, 1]])
print(n1)
print(type(n1))

n1_mod = torch.from_numpy(n1)
print(n1_mod)
print(type(n1_mod))

[[ 1  3  5]
 [-2  1  2]
 [ 0  2  1]]
<class 'numpy.ndarray'>
tensor([[ 1,  3,  5],
        [-2,  1,  2],
        [ 0,  2,  1]])
<class 'torch.Tensor'>


The above example shows how a numpy array n1 is converted into a tensor using a torch function torch.from_numpy().

In [21]:
# Example 2 - from Tensor to numpy array

n1_backto_array = n1_mod.numpy()
print(n1_backto_array)
print(type(n1_backto_array))

assert(n1.all() == n1_backto_array.all())

[[ 1  3  5]
 [-2  1  2]
 [ 0  2  1]]
<class 'numpy.ndarray'>


The tensor created in the previous operation is converted back to the original form i.e. numpy array. Here we don't need any special function, instead the numpy() conversion function is available as an attribute of the torch tensor class.

In [22]:
# Just a point to be noted: The numpy array and tensor share the same memory
# Any changes in one reflect in the other too.

print(n1) # original numpy array -- if we execute the cell twice, we wont see the difference
n1[0,1] = -2
print(n1)
print(n1_mod) # tensor -- converted from numpy

[[ 1  3  5]
 [-2  1  2]
 [ 0  2  1]]
[[ 1 -2  5]
 [-2  1  2]
 [ 0  2  1]]
tensor([[ 1, -2,  5],
        [-2,  1,  2],
        [ 0,  2,  1]])


In [23]:
# Example 3 - breaking (to illustrate when it breaks)

a = np.array(['1', '0'])
print(a)
b = torch.from_numpy(a)
print(b)

['1' '0']


TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

This is a cooked example just to show the function torch.from_numpy() could fail. The reason for failure is obvious as the data is not in expected formats like int or float but is given in the string format. While the numpy module has its own processing functions to convert such data -- numbers presented as strings, torch doesn't allow them as raw string data. 

Other than the above shown cooked example, the conversion functions torch.from_numpy() works all the time, and is often used to transfer between numpy and torch.

## Function 5 - working with gradients

This section covers a few functions (from autograd module) that we need to compute gradients. The autograd module takes the flow of program code, and compute the gradients necessary for backward propagation, however each single iteration could be different depending on the flow/conditions of the code. The function .backward() on a tensor kickstarts the computation of derivatives and generates tracking history. If the tensor is a scalar, no arguments are necessary for .backward() function. If the tensor has more elements, a gradient argument is required. Finally, the gradient of the tensor is stored into .grad attribute. 

We can stop the tracking history using either .detach() to remove the tensor from computational graph, or by wrapping the code block in `with torch.no_grad():`The latter method finds its frequent use when evaluating a model having trainable parameters with `requires_grad=True` which triggers gradient computation when .backward() is called.

One primary requirement for gradient computation of tensors is to set the tensor attribute i.e., requires_grad = True.

`g1 = torch.tensor([1., 0.3])` --> here requires_grad=False by default. This means that there are no gradients to be computed for g1.

`g2 = torch.tensor([1., 0.3], requires_grad=True)`

We can change an existing value of the tensor's argument requires_grad using requires_grad_(value) where value can be True or False.

In [24]:
g1 = torch.tensor([1.])
print(g1.requires_grad) # prints the default value when tensor is created

g1.requires_grad_(True) # changes the value of argument in place

print(g1)

False
tensor([1.], requires_grad=True)


In [25]:
y = g1 + 2
print(y)

tensor([3.], grad_fn=<AddBackward0>)


In [26]:
# Example 1 - evaluating gradients based on the definitions g1, y, and z.

z = y * y
print(z)
out = z.mean()
print(z, out)

tensor([9.], grad_fn=<MulBackward0>)
tensor([9.], grad_fn=<MulBackward0>) tensor(9., grad_fn=<MeanBackward0>)


In [27]:
out.backward()

In [28]:
# print out the gradients d(out)/dx
print(g1.grad)

tensor([6.])


The above gradient evaluated by the tensor attribute .grad can be checked by manually computing the differential which is, (d[out]/dz) times (d[z]/dy) times (d[y]/dg1) = 1 * 2(3) * 1 = 6.

In [29]:
# Example 2 - Another simple function

# g2 = 1, g3 = g2^2, out2 = 2 * g3

# d(out2)/dg2 = (dout2/dg3) (dg3/dg2) = (2) (2 * g2) = (2) (2 * 1) = 4

g2 = torch.tensor([1.], requires_grad=True)

g3 = g2 * g2
print(g3.requires_grad)
out2 = g3 * 2
out2.backward()
print(g2.grad)

True
tensor([4.])


In [30]:
# Example 2 -- How to deactivate the gradient computation.
# Either include the part inside a torch.no_grad() like below
# Or use .detach() to get a new cloned tensor with no requires_grad
g2 = torch.tensor([1.], requires_grad=True)
g3 = g2 * g2

print(g3.requires_grad)

with torch.no_grad():
    g3 = g2 * g2
    print(g3.requires_grad)
g3.requires_grad_(True)
print(g3.requires_grad)
dummy = g3.detach()
print(dummy.requires_grad)
assert(g3==dummy)

True
False
True
False


First we set the g2 to have gradients. Then we define g3 as g2^2, and by default, g3 inherits the requires_grad, which is True, and hence the first print() gives True.

Then we define the g3 definition inside `with torch.no_grad():`, and hence the requires_grad is set to False, thus the second print() gives False.

We are explicitly setting the requires_grad attribute using requires_grad_(True) which is an inplace operation, and the following print() gives True for the requires_grad attribute.

Finally, we use the .detach() to detach the requires_grad attribute and create a clone of g3 stored in dummy with requires_grad set to False, and hence the print() gives False.

Other than requires_grad attribute both the dummy and g3 are same, and assert is used to confirm this.

In [31]:
# Example 3 - breaking (to illustrate when it breaks)

with torch.no_grad():
    g2 = torch.tensor([1.], requires_grad=True)
    g3 = g2 * g2
    print(g3.requires_grad)
    out2 = g3 * 2
    out2.backward()
    print(g2.grad)

False


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

The definition of g3 is inside `with torch.no_grad():`, and hence requires_grad is set to False, the print() confirms this.

Following operations, specifically out2.backward() doesn't work because there are no gradients to compute, and hence gives error.

We discussed couple of functions that are used within autograd module. 1) First the attribute requires_grad of tensor should be set to True, to compute gradients of the tensor, 2) Example showed how partial derivatives are used to compute the gradient of the output w.r.t input, 3) The attribute .grad stores the gradients thus evaluated, 4) we can inactivate the gradient calculations by using .detach to separate the requires_grad attribute, or moving those parts of calculation that doesn't require gradients computation inside `with torch.no_grads():`

## Conclusion

There are a lot of functions to read and understand. I guess it is better to learn these functions and its usages along with the deep learning algorithms and domain applications. If we choose to learn the functions first and then apply in the deep learning applications, it may not work beneficially to us. So learn the deep learning algorithms / techniques such as CNNs, RNNs, DNNs, etc, and apply these tensor functions whereever required.

## Reference Links
Provide links to your references and other interesting articles about tensors
* Official documentation for `torch.Tensor`: https://pytorch.org/docs/stable/tensors.html
* https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
* https://medium.com/@Geeks_Today/five-magical-function-in-pytorch-956b3c7665a1

In [32]:
!pip install jovian --upgrade --quiet

In [33]:
import jovian

In [35]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "vgops75/01-tensor-operations" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ml/vgops75/01-tensor-operations[0m


'https://jovian.ml/vgops75/01-tensor-operations'