# Introduction to _PyTorch_ 

_*PyTorch*_ is an open-source deep learning framework developed by Facebook's AI Research lab. It provides a flexible and efficient platform for building and training neural networks, supporting dynamic computation graphs and GPU acceleration. PyTorch is widely used in both academia and industry for research and production due to its intuitive interface and strong community support.

For more details, visit the [official PyTorch documentation](https://pytorch.org/docs/stable/index.html).

In [3]:
import torch

## Tensors

In machine learning we will deal with tensors a lot. As a reminder 1-d tensor is a vector (called array in programming jargon); a 2-d tensor is a matrix; if dimentions are k>2 we talk about $k^{th}$-order tensors.

In [None]:
x = torch.arange(12, dtype=torch.float32)
print('printing the x vector:', x) 
# note that in jupyter notebooks, the output of the last line is automatically displayed even without a print statement:
x

### Counting elements, shape and reshape

In [None]:
x.numel()

In [None]:
x.shape 

In [None]:
x.reshape(3, 4) # Reshape to 3 rows and 4 columns

In [None]:
print(x.shape) #note that this does not change the original tensor! you need to assign it to a new variable or overwrite the original one
X = x.reshape(3, 4) # Now x is reshaped
print(X.shape)

### zeros-, ones- and rand-tensors

In [None]:
torch.zeros((2, 3, 4)) # this creates a 3-d tensor of shape (2, 3, 4) filled with zeros


In [None]:
torch.ones((2, 3, 4)) # this creates a 3-d tensor of shape (2, 3, 4) filled with ones

In [None]:
torch.randn(3, 4) # this creates a 2-d tensor of shape (3, 4) filled with random numbers from a normal distribution

In [None]:
torch.tensor([[2, 1, 4, 3], 
              [1, 2, 3, 4], 
              [4, 3, 2, 1]]).shape # Create a 2D tensor with specific values

### Indexing and Slicing

In [None]:
X #we defined X above, so this will show the reshaped tensor

In [None]:
X[0] # Access the first row of the tensor

In [None]:
X[-1] # Access the last row of the tensor


In [None]:
X[1:3] # Access rows 1 and 2 of the tensor. Note that index 3 is not included!

In [None]:
X[1, 2] = 17 # Change the value at row 1, column 2 to 17
X

-------------------- *YOUR TURN*!!! ----------------

Now try to overwrite all values in the forst 2 rows of the vector to 0:

In [None]:
# Wrtite your own code to overwrite all values in the first 2 rows of the vector to 0


### Operation between tensors

Element-wise operations 
1) thourgh unitary scalar operations 
2) through binary scalar operations 
3) through broadcasting

In [None]:
torch.exp(x)

-------------------- *YOUR TURN*!!! ----------------

Generate 2 arrays, x and y, on length 5 (aka 5 number of elements each); then try the following operations:
x + y, x - y, x * y, x / y, x ** y

In [None]:
# write your own code here

### Broadcasting

Under certain conditions, even when shapes differ, we can still perform ele- mentwise binary operations by invoking the broadcasting mechanism.

In [None]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

Since a and b are 3 × 1 and 1 × 2 matrices, respectively, their shapes do not match up. Broadcasting produces a larger 3 × 2 matrix by replicating matrix a along the columns and matrix b along the rows before adding them elementwise.

In [None]:
a + b

### Concatenate tensors, logical statements and sum-all-elements operation

In [None]:
# this will be very useful when we will build Convolutional Neural Networks (CNNs) later in the course
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

X, Y, torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

In [None]:
X == Y # Element-wise comparison between tensors -- returns a tensor of boolean values

In [None]:
X.sum(), X.sum(dim=0), X.sum(dim=1) # Sum all elements, sum along rows, sum along columns

### Saving Memory

In [None]:
# this is crucial in machine learning, as models can have millions of parameters, and we need to save memory!
before = id(Y) 
Y=Y+X
id(Y) == before

In [None]:
Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

### Conversion to Other Python Objects

In [None]:
# this is how you convert a tensor to a numpy array and back 
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

In [None]:
# this is how you convert a tensor to a Python list (less used, but still useful)
X.tolist(), X

In [None]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)


## Linear Algebra

### Scalars

In [4]:
x = torch.tensor(3.0)
y = torch.tensor(2.0)
x + y, x * y, x / y, x**y

(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))

### vectors

In [None]:
# note that python has a zero-based indexing, so the first element is at index 0
x = torch.arange(3)
x, x[0], x[1], x[2]

(tensor([0, 1, 2]), tensor(0), tensor(1), tensor(2))

In [9]:
len(x) # Count the number of elements in the tensor


3

In [11]:
x.shape # Get the shape of the tensor. Note this is a different type the the output of `len(x)`!

torch.Size([3])

### Matrices

In [12]:
A = torch.arange(6).reshape(3, 2)
A

tensor([[0, 1],
        [2, 3],
        [4, 5]])

In [None]:
A.T # Transpose the matrix A. Symmetric matrices are the subset of square matrices that are equal to their own transposes
A 

tensor([[0, 2, 4],
        [1, 3, 5]])

### Tensors and tensor aritmethic 

In [15]:
torch.arange(24).reshape(2, 3, 4)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

In [21]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = A.clone()  # Assign a copy of A to B by allocating new memory
A, A + B # Element-wise addition


(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[ 0.,  2.,  4.],
         [ 6.,  8., 10.]]))

### Element-wise multiplication

In [22]:
A * B # Element-wise multiplication

tensor([[ 0.,  1.,  4.],
        [ 9., 16., 25.]])

In [23]:
a = 2
X = torch.arange(24).reshape(2, 3, 4) 
a + X, a * X, (a * X).shape # addition and multiplication with a scalar, and the shape of the resulting tensor (unchanged)

(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 tensor([[[ 0,  2,  4,  6],
          [ 8, 10, 12, 14],
          [16, 18, 20, 22]],
 
         [[24, 26, 28, 30],
          [32, 34, 36, 38],
          [40, 42, 44, 46]]]),
 torch.Size([2, 3, 4]))

### Sums of elements in a tensor

In [None]:
# Sum of elements in a tensor
A.sum(), A.sum(dim=0), A.sum(dim=1) # Sum all elements

(tensor(15.), tensor([3., 5., 7.]), tensor([ 3., 12.]))

In [31]:
A.sum(axis=[0, 1]) == A.sum() # Same as A.sum()

tensor(True)

In [32]:
A.mean(), A.sum() / A.numel() # Mean and average of elements in a tensor

(tensor(2.5000), tensor(2.5000))

In [33]:
A.mean(axis=0), A.sum(axis=0) / A.shape[0] # Mean and average of elements in a tensor along the first axis

(tensor([1.5000, 2.5000, 3.5000]), tensor([1.5000, 2.5000, 3.5000]))

In [41]:
# Sometimes it can be useful to keep the number of axes unchanged when invoking the func- tion for calculating the sum or mean. 
# This matters when we want to use the broadcast mechanism.

sum_A = A.sum(axis=1, keepdims=True)
A, A.shape, A.sum(axis=0), A.sum(axis=0).shape, sum_A, sum_A.shape

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 torch.Size([2, 3]),
 tensor([3., 5., 7.]),
 torch.Size([3]),
 tensor([[ 3.],
         [12.]]),
 torch.Size([2, 1]))

In [43]:
# since sum_A keeps its two axes after summing each row, 
# we can divide A by sum_A with broadcasting to create a matrix where each row sums up to 1.
# this is a common technique for normalizing data, expecially for classification tasks, in which
# we want to ensure that the sum of probabilities across each row is 1.
A / sum_A 

tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167]])

In [None]:
# If we want to calculate the cumulative sum of elements of A along some axis, say axis=0, 
# we can call the cumsum function.
A.cumsum(axis=0)

tensor([[0., 1., 2.],
        [3., 5., 7.]])

### Dot product

In [49]:
# dot product of two vectors
x = torch.arange(3, dtype = torch.float32)
y = torch.ones(3, dtype = torch.float32)
x, y, torch.dot(x, y)
# or equivalently
torch.sum(x * y) # Element-wise multiplication followed by summation

tensor(3.)

### Matrix-vector multiplication

In [55]:
# Matrix-vector multiplication -- this is a common operation in machine learning, especially in linear layers
A.shape, x.shape, torch.mv(A, x), A@x, (A@x).shape

(torch.Size([2, 3]),
 torch.Size([3]),
 tensor([ 5., 14.]),
 tensor([ 5., 14.]),
 torch.Size([2]))

### Matrix-matrix multiplication

In [57]:
# Matrix-matrix multiplication
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = torch.ones(3, 4)
A, B, torch.mm(A, B), A@B

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 tensor([[ 3.,  3.,  3.,  3.],
         [12., 12., 12., 12.]]),
 tensor([[ 3.,  3.,  3.,  3.],
         [12., 12., 12., 12.]]))

### Norms

### l2 norm (Euclidean norm)

In [58]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

tensor(5.)

In [None]:
(u * u).sum().sqrt()  #

tensor(5.)

### l1 norm (Manhattan distance)

In [60]:
torch.abs(u).sum()

tensor(7.)

### Frobenius norm (l2 norm for matrices)

In [62]:
D = torch.ones((4, 9))
D, torch.norm(D)

(tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1.]]),
 tensor(6.))

-------------------- *YOUR TURN*!!! ----------------

Define a vector x of values ranging between -5 and 5. Plot the x^2 and |x|. 

Looking at the plot, think about what is the effect of l2 and l1 norms of different vectors, how do their norms compare if both are computed as l2 or as l1?  

In [63]:
# write you code here. Fro plotttin you can use the pythin library matplotlib
import matplotlib.pyplot as plt

# ...code for plotting...