# Introduction to _PyTorch_ 

_*PyTorch*_ is an open-source deep learning framework developed by Facebook's AI Research lab. It provides a flexible and efficient platform for building and training neural networks, supporting dynamic computation graphs and GPU acceleration. PyTorch is widely used in both academia and industry for research and production due to its intuitive interface and strong community support.

For more details, visit the [official PyTorch documentation](https://pytorch.org/docs/stable/index.html).

In [None]:
import torch

## Tensors

In machine learning we will deal with tensors a lot. As a reminder 1-d tensor is a vector (called array in programming jargon); a 2-d tensor is a matrix; if dimentions are k>2 we talk about $k^{th}$-order tensors.

In [None]:
x = torch.arange(12, dtype=torch.float32)
print('printing the x vector:', x) 
# note that in jupyter notebooks, the output of the last line is automatically displayed even without a print statement:
x

### Counting elements, shape and reshape

`numel()` returns the total number of elements in the tensor x, regardless of its shape or dimensions.

In [None]:
x.numel() 

The attribute `x.shape` returns the dimensions (size of each axis) of the tensor x. 

In [None]:
x.shape 

`x.reshape()` is used to change the shape of the tensor x without changing its data, as long as the total number of elements remains the same.

In [None]:
x.reshape(3, 4) # Reshape to 3 rows and 4 columns

In [None]:
print(x.shape) #note that this does not change the original tensor! you need to assign it to a new variable or overwrite the original one

X = x.reshape(3, 4) # Now x is reshaped

print(X.shape)

### zeros-, ones- and randn-tensors

`torch.zeros()` creates a tensor filled with zeros, of a specified shape and data type.

In [None]:
torch.zeros((2, 3, 4)) # this creates a 3-d tensor of shape (2, 3, 4) filled with zeros


`torch.ones()` creates a tensor filled with ones, with the specified shape, and data type.

In [None]:
torch.ones((2, 3, 4)) # this creates a 3rd order tensor of shape (2, 3, 4) filled with ones

# Note: this is an example of how an RGB image can be represented as a 3rd order tensor: (channels, width, height)
# In CNN, the input images are typically represented as 4th order tensors: (batch_size, channels, width, height)

`torch.randn()` creates a tensor filled with random numbers drawn from a standard normal distribution.

In [None]:
torch.randn(3, 4) # this creates a 2nd order tensor (matrix) of shape (3, 4) filled with random numbers from a normal distribution with mean 0 and variance 1

`torch.tensor()` creates a tensor directly from a Python list, tuple, or NumPy array.

In [None]:
torch.tensor([[2, 1, 4, 3], 
              [1, 2, 3, 4], 
              [4, 3, 2, 1]]).shape # Create a 2nd order tensor (matrix) with specific values

### Indexing and Slicing
Indexing and slicing in PyTorch work similarly to NumPy, allowing you to extract, modify, or rearrange parts of a tensor.

In [None]:
X #we defined X above, so this will show the reshaped tensor

In [None]:
X[0] # Access the first row of the tensor

In [None]:
X[-1] # Access the last row of the tensor


In [None]:
X[1:3] # Access rows 1 and 2 of the tensor. Note that index 3 is not included!

In [None]:
X[1, 2] = 17 # Change the value at row 1, column 2 to 17
X

-------------------- *YOUR TURN*!!! ----------------

Now try to overwrite all values in the first 2 rows of the vector to 0:

In [None]:
# Write your own code to overwrite all values in the first 2 rows of the matrix to 0


### Operation between tensors

Element-wise operations 
1) thourgh unitary scalar operations 
2) through binary scalar operations 
3) through broadcasting

In [None]:
torch.exp(x)

-------------------- *YOUR TURN*!!! ----------------

Generate 2 arrays, x and y, on length 5 (aka 5 number of elements each); then try the following operations:
x + y, x - y, x * y, x / y, x ** y

In [None]:
# write your own code here

### Broadcasting

Under certain conditions, even when shapes differ, we can still perform elementwise binary operations by invoking the broadcasting mechanism.

In [None]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

Since a and b are 3 × 1 and 1 × 2 matrices, respectively, their shapes do not match up. Broadcasting produces a larger 3 × 2 matrix by replicating matrix a along the columns and matrix b along the rows before adding them elementwise.

In [None]:
(a * b).shape

### Concatenate tensors, logical statements and sum-all-elements operation

In [None]:
# this will be very useful when we will build Convolutional Neural Networks (CNNs) later in the course
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

X, Y, torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

In [None]:
X == Y # Element-wise comparison between tensors -- returns a tensor of boolean values

In [None]:
X.sum(), X.sum(dim=0), X.sum(dim=1) # Sum all elements, sum along rows, sum along columns

### Saving Memory

In [None]:
# this is crucial in machine learning, as models can have millions of parameters, and we need to save memory!
before = id(Y) 
Y=Y+X
id(Y) == before

In [None]:
Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

### Conversion to Other Python Objects

In [None]:
# this is how you convert a tensor to a numpy array and back 
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

In [None]:
# this is how you convert a tensor to a Python list (less used, but still useful)
X.tolist(), X

In [None]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)


## Data Preprocessing

So far, we have been working with synthetic data that arrived in ready-made tensors. 

However, to apply deep learning in the wild we must extract messy data stored in arbitrary formats, and preprocess it to suit our needs. Fortunately, the pandas library45 can do much of the heavy lifting. 

In [None]:
import os
import pandas as pd

In [None]:
os.makedirs(os.path.join('..', 'data'), exist_ok=True)

data_file = os.path.join('..', 'data', 'house_tiny.csv')

with open(data_file, 'w') as f:
    f.write('''NumRooms,RoofType,Price
NA,NA,127500
2,NA,106000
4,Slate,178100
NA,NA,140000''')


Before proceeding, look for the file that you just created (in the `data` folder that appeared in your repository). Inspect the `house_tiny.csv` file.

In [None]:
data = pd.read_csv(data_file)
print(data)

Now, Our first step in processing the dataset is to separate out columns corresponding to input versus target values. 
We can select columns either by name or via integer-location based indexing (`iloc`).

In [None]:
inputs, targets = data.iloc[:, 0:2], data.iloc[:, 2]

You might have noticed that pandas replaced all CSV entries with value NA with a spe- cial NaN (not a number) value. This can also happen whenever an entry is empty, e.g., “3„,270000”. These are called missing values and they are the “bed bugs” of data science.

 missing values might be handled either via 
 - __imputation__ : replaces missing values with estimates of their values 
 - __deletion__  : simply discards either those rows or those columns that contain missing values.

For categorical input fields, we can treat NaN as a category. Since the `RoofType` column takes values `Slate` and `NaN`, pandas can convert this column into two columns `RoofType_Slate` and `RoofType_nan`. 

In [None]:
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)

For missing numerical values, one common heuristic is to replace the NaN entries with the mean value of the corresponding column.

In [None]:
inputs = inputs.fillna(inputs.mean())
print(inputs)

Now that all the entries in inputs and targets are numerical, we can load them into a tensor:

In [None]:
X = torch.tensor(inputs.to_numpy(dtype=float))
y = torch.tensor(targets.to_numpy(dtype=float))
X, y

Note: Data visualization tools such as seaborn47 , Bokeh48 , or matplotlib49 can help you to manually inspect the data and develop intuitions about the type of problems you may need to address.

Try loading datasets, e.g., Abalone from the [UCI Machine Learning Repository](https://archive.ics.uci.edu) and inspect their properties. What fraction of them has missing values? What fraction of the variables is numerical, categorical, or text?

## Linear Algebra

### Scalars

In [None]:
x = torch.tensor(3.0)
y = torch.tensor(2.0)
x + y, x * y, x / y, x**y

### vectors

In [None]:
# note that python has a zero-based indexing, so the first element is at index 0
x = torch.arange(3)
x, x[0], x[1], x[2]

In [None]:
len(x) # Count the number of elements in the tensor


In [None]:
x.shape # Get the shape of the tensor. Note this is a different type the the output of `len(x)`!

### Matrices

In [None]:
A = torch.arange(6).reshape(3, 2)
A

In [None]:
A.T # Transpose the matrix A. Symmetric matrices are the subset of square matrices that are equal to their own transposes
A 

### Tensors and tensor aritmethic 

In [None]:
torch.arange(24).reshape(2, 3, 4)

In [None]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = A.clone()  # Assign a copy of A to B by allocating new memory
A, A + B # Element-wise addition


### Element-wise multiplication

In [None]:
A * B # Element-wise multiplication

In [None]:
a = 2
X = torch.arange(24).reshape(2, 3, 4) 
a + X, a * X, (a * X).shape # addition and multiplication with a scalar, and the shape of the resulting tensor (unchanged)

### Sums of elements in a tensor

In [None]:
# Sum of elements in a tensor
A.sum(), A.sum(dim=0), A.sum(dim=1) # Sum all elements

In [None]:
A.sum(axis=[0, 1]) == A.sum() # Same as A.sum()

In [None]:
A.mean(), A.sum() / A.numel() # Mean and average of elements in a tensor

In [None]:
A.mean(axis=0), A.sum(axis=0) / A.shape[0] # Mean and average of elements in a tensor along the first axis

In [None]:
# Sometimes it can be useful to keep the number of axes unchanged when invoking the func- tion for calculating the sum or mean. 
# This matters when we want to use the broadcast mechanism.

sum_A = A.sum(axis=1, keepdims=True)
A, A.shape, A.sum(axis=0), A.sum(axis=0).shape, sum_A, sum_A.shape

In [None]:
# since sum_A keeps its two axes after summing each row, 
# we can divide A by sum_A with broadcasting to create a matrix where each row sums up to 1.
# this is a common technique for normalizing data, expecially for classification tasks, in which
# we want to ensure that the sum of probabilities across each row is 1.
A / sum_A 

In [None]:
# If we want to calculate the cumulative sum of elements of A along some axis, say axis=0, 
# we can call the cumsum function.
A.cumsum(axis=0)

### Dot product

In [None]:
# dot product of two vectors
x = torch.arange(3, dtype = torch.float32)
y = torch.ones(3, dtype = torch.float32)
x, y, torch.dot(x, y)
# or equivalently
torch.sum(x * y) # Element-wise multiplication followed by summation

### Matrix-vector multiplication

In [None]:
# Matrix-vector multiplication -- this is a common operation in machine learning, especially in linear layers
A.shape, x.shape, torch.mv(A, x), A@x, (A@x).shape

### Matrix-matrix multiplication

In [None]:
# Matrix-matrix multiplication
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = torch.ones(3, 4)
A, B, torch.mm(A, B), A@B

### Norms

#### l2 norm (Euclidean norm)

In [None]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

In [None]:
(u * u).sum().sqrt()  #

#### l1 norm (Manhattan distance)

In [None]:
torch.abs(u).sum()

#### Frobenius norm (l2 norm for matrices)

In [None]:
D = torch.ones((4, 9))
D, torch.norm(D)

-------------------- *YOUR TURN*!!! ----------------

Define a vector x of values ranging between -5 and 5. Plot the x^2 and |x|. 

Looking at the plot, think about what is the effect of l2 and l1 norms of different vectors, how do their norms compare if both are computed as l2 or as l1?  

In [None]:
# write you code here. To plotting you can use the python library matplotlib
import matplotlib.pyplot as plt

# ...code for plotting...

# Exercises

Use your coding skills to di the following exercizes.

### Ex. 1 -- Prove that the transpose of the transpose of a matrix is the matrix itself 

$(A^⊤)^⊤ = A$.

### Ex. 2 -- Given two matrices A and B, show that sum and transposition commute: 

$A^⊤ + B^⊤ = (A + B)^⊤$.

### Ex. 3 -- We defined the tensor X of shape (2, 3, 4) in this section. What is the output of len(X)? Write your answer without implementing any code, then check your answer using code.

### Ex 4 -- Consider three matrices, say A, B, C ∈ R100×200. Construct a tensor with three axes by stacking [A, B, C]. What is the dimensionality? Slice out the second coordinate of the third axis to recover B. Check that your answer is correct.

### Ex 5 -- Consider three large matrices, say A ∈ R210 ×216 , B ∈ R216 ×25 and C ∈ R25 ×214 , ini- tialized with Gaussian random variables. You want to compute the product ABC. Is there any difference in memory footprint and speed, depending on whether you compute (AB)C or A(BC). Why?

In [None]:
#A  = ...
#B  = ...
#C  = ...


import tracemalloc

# Memory footprint of the product (AB)C
tracemalloc.start()
AB = torch.mm(A, B)
ABC = torch.mm(AB, C)
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage: {current / (1024 * 1024):.5f} MB")
print(f"Peak memory usage: {peak / (1024 * 1024):.5f} MB")
tracemalloc.stop()

# Memory footprint of the product A(BC)
tracemalloc.start()
BC = torch.mm(B, C)
ABC_alt = torch.mm(A, BC)
current2, peak2 = tracemalloc.get_traced_memory()
print(f"Current memory usage: {current2 / (1024 * 1024):.5f} MB")
print(f"Peak memory usage: {peak2 / (1024 * 1024):.5f} MB")
tracemalloc.stop()
