# Pytorch - tensors tutorial

Contents are taken from:

[Pytorch - Intro to tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)

[Pytorch - Another intro to tensors](https://pytorch.org/tutorials/beginner/nlp/pytorch_tutorial.html)


## Creating tensors 

Creating tensors from an array is pretty straightforward

In [1]:
import torch
import numpy as np

data = [
    [1,2],
    [3,4],
]

x_data = torch.tensor(data)
print(f"{x_data=} {x_data.dtype=}")


x_data=tensor([[1, 2],
        [3, 4]]) x_data.dtype=torch.int64


Tensors have data types and it's important to keep track of them for neural nets to work correctly.

In [15]:
x_data = torch.Tensor(data)
print(f"{x_data=} {x_data.dtype=}")

x_data=tensor([[1., 2.],
        [3., 4.]]) x_data.dtype=torch.float32


`torch.Tensor` always creates a tensor of type float32

`torch.tensor` creates a tensor where the data type is automatically inferred (can be an integer).

In [18]:
npa = np.array(data)
torch.from_numpy(npa)


tensor([[1, 2],
        [3, 4]])

In [19]:
x_ones = torch.ones_like(x_data)
x_ones

tensor([[1., 1.],
        [1., 1.]])

In [20]:
x_rand = torch.rand_like(x_data)
x_rand

tensor([[0.8694, 0.4451],
        [0.3105, 0.8647]])

`torch.ones_like` retains the shape and data type of another tensor but makes a tensor of 1 values.

`torch.rand_like` retains the shape and data type of another tensor but makes a tensor with uniform random values between 0 and 1.

In [22]:
shape = (2,3)

rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"{rand_tensor=}")
print(f"{ones_tensor=}")
print(f"{zeros_tensor=}")

rand_tensor=tensor([[0.6626, 0.9979, 0.5738],
        [0.8015, 0.2067, 0.0500]])
ones_tensor=tensor([[1., 1., 1.],
        [1., 1., 1.]])
zeros_tensor=tensor([[0., 0., 0.],
        [0., 0., 0.]])


Tensors are also stored on a device (cpu / gpu).

They have to explicitly be moved from CPU -> GPU.

In [24]:
tensor = torch.rand((3, 4))

print(tensor.device)


cpu


In [23]:
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

## Indexing and slicing

There are many ways to select elements from a tensor.
Generally, the indexing rules are similar to Python lists.

In [30]:
tensor = torch.arange(12).reshape(3, 4)

print(f"{tensor=}")

# print the first row
print("FIRST ROW:")
print(tensor[0])
print(tensor[0, :])
print(tensor[0, ...])

# print the first column
print("FIRST COL:")
print(tensor[:, 0])

# print the last column
print("LAST COL:")
print(tensor[:, -1])
print(tensor[..., -1])

# print every even column
print("EVERY EVEN COL:")
print(tensor[:, ::2])

tensor=tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
FIRST ROW:
tensor([0, 1, 2, 3])
tensor([0, 1, 2, 3])
tensor([0, 1, 2, 3])
FIRST COL:
tensor([0, 4, 8])
LAST COL:
tensor([ 3,  7, 11])
tensor([ 3,  7, 11])
EVERY EVEN COL:
tensor([[ 0,  2],
        [ 4,  6],
        [ 8, 10]])


In [32]:
# modify a column
tensor[:, 1] = 0
tensor

tensor([[ 0,  0,  2,  3],
        [ 4,  0,  6,  7],
        [ 8,  0, 10, 11]])

## Combining tensors with `cat` and `stack`

In [37]:
# torch.cat

print(f"{tensor.shape=}")
t1 = torch.cat([tensor, tensor, tensor, tensor], dim=1)
print(f"{t1.shape=}")
t1

tensor.shape=torch.Size([3, 4])
t1.shape=torch.Size([3, 16])


tensor([[0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1.],
        [0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1.],
        [0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1.]])

In [33]:
input2 = torch.ones((2, 3, 4))
t2 = torch.cat([input2, input2, input2], dim=1)
print(f"{t2.shape=}")
t2

t2.shape=torch.Size([2, 9, 4])


tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

In [34]:
# torch.stack

tensor = torch.ones((3, 4))
t1 = torch.stack([tensor, tensor], dim=0)
print(f"{t1.shape=}")
t1

t1.shape=torch.Size([2, 3, 4])


tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

`torch.stack` is different from `torch.cat` 

`torch.stack` creates a new tensor, but the inputs are glued together along a new dimension (that you specify).
`torch.cat` creates a new tensor, but the inputs are glued together along an existing dimension (that you specify).

concatenating two 3x4 tensors, gives you a new tensor of size 3x8
stacking two 3x4 tensors, gives you a new tensor of size 2x3x4

[Reference explanation on stackoverflow](https://stackoverflow.com/questions/54307225/whats-the-difference-between-torch-stack-and-torch-ca)

## randn vs rand

In [48]:
x = torch.randn((2, 3, 4))
x

tensor([[[ 1.2626, -0.3817,  1.3986, -0.6001],
         [-0.7493,  0.8892,  0.0382, -0.9166],
         [-0.9668, -0.2828, -0.6239,  0.2364]],

        [[ 0.2725, -0.8735,  0.5722, -1.0876],
         [ 0.6604, -0.8016,  0.4640,  1.4917],
         [-1.1043, -1.4910,  0.1564, -0.7772]]])

`torch.randn` and `torch.rand` are also different.

`rand` samples uniformly between [0, 1]

`randn` samples with a normal distribution where the mean is 0 and the variance is 1.

## Auto differentiation

In [None]:
x = torch.tensor([1, 2, 3], requires_grad=True, dtype=torch.float32)
y = torch.tensor([4, 5, 6], requires_grad=True, dtype=torch.float32)

z = x + y
print(z)

print(z.grad_fn)

tensor([5., 7., 9.], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x11f553d00>


When you perform operations on tensors, Pytorch remembers how each value was produced.

Here `z.grad_fn=AddBackward0` is telling us that `z` was produced through an addition of tensors

In [55]:
s = z.sum()
print(s)
print(s.grad_fn)

tensor(21., grad_fn=<SumBackward0>)
<SumBackward0 object at 0x11f069180>


Here `s.grad_fn=SumBackward0` is telling us that `s` was produced by summing up the values of a tensor

In [57]:
s.backward()
print(x.grad)

tensor([2., 2., 2.])


### A complex auto differentiation code block

In [36]:
x = torch.randn(2, 2)
y = torch.randn(2, 2)
# By default, user created Tensors have ``requires_grad=False``
print(x.requires_grad, y.requires_grad)
z = x + y
# So you can't backprop through z
print("Before we require gradients, z.grad_fn is", z.grad_fn)

# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
# flag in-place. The input flag defaults to ``True`` if not given.
x = x.requires_grad_()
y = y.requires_grad_()
# z contains enough information to compute gradients, as we saw above
z = x + y
print("After we require gradients, z.grad_fn is", z.grad_fn)
# If any input to an operation has ``requires_grad=True``, so will the output
print(f"{z.requires_grad=}")

# Now z has the computation history that relates itself to x and y
# Can we just take its values, and **detach** it from its history?
new_z = z.detach()

# ... does new_z have information to backprop to x and y?
# NO!
print(f"After detatching {new_z.grad_fn=}")
# And how could it? ``z.detach()`` returns a tensor that shares the same storage
# as ``z``, but with the computation history forgotten. It doesn't know anything
# about how it was computed.
# In essence, we have broken the Tensor away from its past history

False False
Before we require gradients, z.grad_fn is None
After we require gradients, z.grad_fn is <AddBackward0 object at 0x110c0fbe0>
z.requires_grad=True
After detatching new_z.grad_fn=None


## view and reshape

In [43]:
a = torch.arange(24).reshape(2, 3, 4)
a

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

Both `reshape` and `view` return a tensor with the same contents as the input but with a new shape.

The difference between `reshape` and `reshape` is not so obvious.

`view` always returns a view into the existing tensor. So when you modify the result, the original tensor gets modified too.

`reshape` tries to return a view if it can, but if it can't, it makes a copy of the input tensor and reshapes that.

[This thread](https://discuss.pytorch.org/t/difference-between-view-reshape-and-permute/54157/2) is a good explanation of the difference

In [48]:
a = torch.arange(80).view(4, 10, 2)
b = a.permute(2, 0, 1)

print(f"{a.is_contiguous()=}")
print(f"{b.is_contiguous()=}")

print(a.view(-1))
print(b.shape)

try:
    print(b.view(1, 80)) # this fails
except Exception as e:
    print(".view() operation failed:", e)

a.is_contiguous()=True
b.is_contiguous()=False
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
        54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
        72, 73, 74, 75, 76, 77, 78, 79])
torch.Size([2, 4, 10])
.view() operation failed: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.


### How are tensors stored in memory?

[Computerphile has a good explanation of how images are stored in memory](https://www.youtube.com/watch?v=06OHflWNCOE).

[And how n dimensional tensors are stored in memory](https://www.youtube.com/watch?v=DfK83xEtJ_k).

You can print the stride and size of a pytorch tensor to see this in action.

Stride is the number of bytes from one element from this dimension to another element of this dimension.

In [51]:
a = torch.arange(80).view(4, 10, 2)

print(f"{a.size()=}")
print(f"{a.stride()=}")

a.size()=torch.Size([4, 10, 2])
a.stride()=(20, 2, 1)
