# Introduction to Tensors (PyTorch)
This notebook introduces PyTorch's Tensors.

For further reading about PyTorch's tensors here are some useful resources:
1. https://pytorch.org/docs/stable/tensors.html (retrieved 2022-12-24, [Github](https://github.com/pytorch/tutorials/blob/d5161086e7277f10c68dd44914f8925fda62f399/beginner_source/blitz/tensor_tutorial.py))
2. https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html (retrieved 2022-12-24, [Github](https://github.com/pytorch/tutorials/blob/c2115df8003e6a3aeeb327441ff4d8389576d6f0/beginner_source/introyt/tensors_deeper_tutorial.py))
3. https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html (retrieved from [Github](https://github.com/pytorch/tutorials/blob/d5161086e7277f10c68dd44914f8925fda62f399/beginner_source/blitz/tensor_tutorial.py))
4. Chollet, p.26-47 (for definitions and a Tensorflow implementation)

The images were taken from: https://www.tensorflow.org/guide/tensor (retrieved from [Github](https://github.com/tensorflow/docs/blob/9bfffb91247233025892f2d293aa4d206c0ccad9/site/en/guide/tensor.ipynb)). Presented here according to the Apache 2.0 License.
Matrix Multiplication example were taken from Geron's Linear Algebra appendix ([Github](https://github.com/ageron/handson-ml3/blob/f122f5ac70636214aeea04c8ee3541d8ef59f715/math_linear_algebra.ipynb)). Presented here according to the Apache 2.0 License.

### The Importance of Tensors for Deep Learning

In deep learning, tensors are the basic unit of data. Tensors represents both:
* _Data_ such as the input, output and intermediate representations of a model
* The _Parameters_ of the model, such as weights and biases

### Tensor - a Definition

**Tensors are multi-dimensional arrays with a uniform type (called a `dtype`).**

The most obvious differences between NumPy arrays and PyTorch and Tensorflow tensors are:

1. Tensors can be backed by accelerator memory (like GPU, TPU).
2. Tensors can have non-numerical data types
3. Tensors can be ragged (non-rectangular)
4. Tensors have a different set of operations (extended, generally speaking)
5. Tensorflow's Tensors are immutable. In Tensorflow, all tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one. On the other hand, PyTorch tensors are mutable, just like in NumPy.


The mathematical definition of a tensor ([Wikipedia](https://en.wikipedia.org/wiki/Tensor)) and the tensor implementation in PyTorch and Tensorflow are similar, but not exactly equivalent. The tensor of the DL frameworks is an _object_ (e.g. in Python), that is based on the mathematical concept, but it extends it, and can break it on occasion. Some of the notable differences are that tensors in DL frameworks can be ragged (non-rectangular), and include operations such as broadcasting which are not natively defined as part of the mathematical concept. They can also include 'programmatic' attributes and methods (such as device placement, that are obviously not part of the mathematical definition).

### The Shape and Rank of a Tensor

* Rank (dimensionality) - the number of axes in a tensor.
* Shape - a tuple containing the number of elements in each axis (for rectangular tensors).

_Note: The term dimensionality can denote either the number of entries along a specific axis or the number of axes in a tensor, which can be confusing at times._

Examples:

<table>
<tr>
  <th>Rank 0 (a scalar)</th>
  <th>Rank 1 (a vector)</th>
  <th>Rank 2 (a matrix)</th>
</tr>
<tr>
  <td>
   <img src="../images/tensor/scalar.png" alt="Rank 0 (a scalar)" />
  </td>

  <td>
   <img src="../images/tensor/vector.png" alt="Rank 1 (a vector)"/>
  </td>
  <td>
   <img src="../images/tensor/matrix.png" alt="Rank 2 (a matrix)">
  </td>
</tr>
</table>


<table>
<tr>
  <th colspan=3>A rank-3 tensor (represented in three different ways) </th>
<tr>
<tr>
  <td>
   <img src="../images/tensor/3-axis_numpy.png"/>
  </td>
  <td>
   <img src="../images/tensor/3-axis_front.png"/>
  </td>

  <td>
   <img src="../images/tensor/3-axis_block.png"/>
  </td>
</tr>

</table>

<table>
<tr>
  <th colspan=2>A rank-4 tensor, shape: <code>[3, 2, 4, 5]</code></th>
</tr>
<tr>
  <td>
<img src="../images/tensor/shape.png" alt="A tensor shape is like a vector.">
    <td>
<img src="../images/tensor/4-axis_block.png" alt="A 4-axis tensor">
  </td>
  </tr>
</table>


While axes are often referred to by their indices, you should always keep track of the meaning of each. Often axes are ordered from global to local: The batch axis first, followed by spatial dimensions, and features for each location last. This way feature vectors are contiguous regions of memory.

<table>
<tr>
<th>Typical axis order</th>
</tr>
<tr>
    <td>
<img src="../images/tensor/shape2.png" alt="Keep track of what each axis is. A 4-axis tensor might be: Batch, Width, Height, Features">
  </td>
</tr>
</table>

In [1]:
import sys
import torch
import helpers as h

In [2]:
# check the version of Python and PyTorch
h.print_python_version()
h.print_pytorch_version()

Python version: 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)]
pyTorch version: 2.0.0+cpu


### Initializing Tensors
Let's see how to initialize tensors using PyTorch

#### Initiailize a rank-1 tensor (vector)
We start with an initialization of a tensor based on a Python list.

In [3]:
x = torch.tensor([1., 2.])
h.print_tensor_info(x)

tensor([1., 2.])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    1
Shape        (2,)


A PyTorch tensor is a class. Here it is of the type `torch.Tensor`. As in any Python class, it has attributes and methods associated with it. 
* Tip: Look into the function definition of `h.printing_tensor_info` to see the Tensor's methods and attributes used to display the values above. 

In PyTorch, each tensor has a `device` associated with it. By default (for constructors such as `torch.tensor`) this device is the CPU. It can also be a GPU. To check for the device a tensor is associated with:

In [4]:
x.device.type

'cpu'

Below we define a vector with a single element. Notice the tensor-dimensionality and shape of this tensor:

In [5]:
x = torch.tensor([1.,])
h.print_tensor_info(x)

tensor([1.])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    1
Shape        (1,)


Even though it has a single element, this is still a vector (a vector is defined by having rank=1 - a single dimension, `tensor.dim() == 1`)

#### Initialize a rank-0 tensor (scalar)

In [6]:
x = torch.tensor(2.5)
h.print_tensor_info(x)

2.5
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    0
Shape        ()


#### Initialize a rank-2 tensor (matrix)

In [7]:
x = torch.tensor([[1., 2.],[3., 4.]])
h.print_tensor_info(x)

tensor([[1., 2.],
        [3., 4.]])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    2
Shape        (2, 2)


#### Initialize a rank-3 tensor

In [8]:
x = torch.tensor([
    [[1.,],[2.,]],
    [[3.,],[4.,]],
    [[5.,],[6.,]],
    ])
h.print_tensor_info(x)

tensor([[[1.],
         [2.]],

        [[3.],
         [4.]],

        [[5.],
         [6.]]])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    3
Shape        (3, 2, 1)


#### Explicitly defining the tensor's data type
The default data type for the constructor `torch.tensor` is `torch.float32`. This is the data type of each of the elements in the tensors. A tensor can have only a single data type. 
To set a different dtype for the elements, we can simply add it as an argument when calling the constructor:  

In [9]:
x = torch.tensor([1.], dtype=torch.double)
h.print_tensor_info(x)

tensor([1.], dtype=torch.float64)
Type         <class 'torch.Tensor'>
dtype        torch.float64
Dimension    1
Shape        (1,)


#### Note
We have been using the word 'constructor' to describe `torch.tensor`. In addition, you might have seen (or might see) defining a constant tensor by `torch.Tensor` instead of `torch.tensor`. Let's take a deeper look:

In [10]:
x = torch.Tensor([1.])
h.print_tensor_info(x)

tensor([1.])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    1
Shape        (1,)


In [11]:
x = torch.IntTensor([1])
h.print_tensor_info(x)

tensor([1], dtype=torch.int32)
Type         <class 'torch.Tensor'>
dtype        torch.int32
Dimension    1
Shape        (1,)


`torch.tensor` and `torch.Tensor` are not the same. `torch.tensor` is a constructor and `torch.Tensor` is an alias for the default tensor type (`torch.FloatTensor`). As can be seen below, the first is a function, while the second is a class.

In [12]:
print(type(torch.tensor))
print(type(torch.Tensor))

<class 'builtin_function_or_method'>
<class 'torch._C._TensorMeta'>


A list of all the tensor classes available in PyTorch can be found here: https://pytorch.org/docs/stable/tensors.html


### Tesor Indexing, Slicing and Assignments
A subset of elements can be accessed by indexing and slicing a tensor. In addition, in PyTorch (but not in Tensorflow) it is possible to assign a new value to some elements within the tensor.

PyTorch and Tensorflow follow standard Python indexing rules, similar to [indexing a list or a string in Python](https://docs.python.org/3/tutorial/introduction.html#strings), and the basic rules for NumPy indexing.

* indexes start at `0`
* negative indices count backwards from the end
* colons, `:`, are used for slices: `start:stop:step`

A rank 2 tensor:

In [13]:
x = torch.tensor([
    [1., 2.],
    [3., 4.], 
    [5., 6.],
    ])
h.print_tensor_info(x)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    2
Shape        (3, 2)


Accessing a value is done simply by indicating its index:

In [14]:
# The element in the second row and first column:
x[1,0]

tensor(3.)

A rank 3 tensor:

In [15]:
x = torch.tensor([
    [[1., 2.],[3., 4.]],
    [[5., 6.],[7., 8.]],
    [[9., 10.],[11., 12.]],
    ])
h.print_tensor_info(x)

tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]],

        [[ 9., 10.],
         [11., 12.]]])
Type         <class 'torch.Tensor'>
dtype        torch.float32
Dimension    3
Shape        (3, 2, 2)


In [16]:
# All elements in the first instances of the first axis:
x[0]

tensor([[1., 2.],
        [3., 4.]])

In [17]:
x[0][0][0]

tensor(1.)

Slicing:

In [18]:
x[0,:,0]

tensor([1., 3.])

In [19]:
x[0:2,0,0]

tensor([1., 5.])

Assignment:

In [20]:
x[1,1,1] = 100

In [21]:
x

tensor([[[  1.,   2.],
         [  3.,   4.]],

        [[  5.,   6.],
         [  7., 100.]],

        [[  9.,  10.],
         [ 11.,  12.]]])

#### Tensor Operations

There are over 100 tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, and random sampling. The are desribed in are comprehensively described in the [torch pacakge doc](https://pytorch.org/docs/stable/torch.html). Check out the list.

#### In-place operations
Operations that have a "`_`" suffix are in-place. For example: `x.copy_(y)`, ``x.t_()``, will change ``x``.

In-place operations save memory by avoiding copynig the data, but cause a loss of history when computing derivatives.

Let's look at an in-place addition operation, and compare it to a 'standard' addition (where the result of the operation is stored in a new location in memory):

In [22]:
tensor = torch.ones(2, 2)
print(tensor, "\n")
tensor.add_(5)  # notice the underscore
print(tensor)

tensor([[1., 1.],
        [1., 1.]]) 

tensor([[6., 6.],
        [6., 6.]])


Compare to:

In [23]:
tensor = torch.ones(2, 2)
print(tensor, "\n")
tensor.add(5)  # no underscore
print(tensor)

tensor([[1., 1.],
        [1., 1.]]) 

tensor([[1., 1.],
        [1., 1.]])


The operations below are all "not-in-place", and these are usually the operations we will work with.

### Tensor Arithmetic

Arithmetic is the branch of mathematics that deals with the manipulation of numbers and quantities. It includes the study of basic operations such as addition, subtraction, multiplication, and division, as well as more advanced ones.

Let's initialize two rank-1 tensors first:

In [24]:
x = torch.tensor([1., 2.])
y = torch.tensor([3., 4.])
print(f'x: {x}')
print(f'y: {y}')

x: tensor([1., 2.])
y: tensor([3., 4.])


The four basic operations of arithmetics (addition, subtraction, multiplication and division, as defined by operator overloading) are conducted element-wise:

In [25]:
print(f'x + y : {x + y}')

x + y : tensor([4., 6.])


In [26]:
print(f'x - y : {x - y}')

x - y : tensor([-2., -2.])


In [27]:
print(f'x * y : {x * y}')

x * y : tensor([3., 8.])


In [28]:
print(f'x / y : {x / y}')

x / y : tensor([0.3333, 0.5000])


Let's look at two foundational operations: the dot product and matric multiplication:
#### The dot-product

$\textbf{u} \cdot \textbf{v} = \sum_i{\textbf{u}_i \times \textbf{v}_i}$

In [29]:
x.dot(y)

tensor(11.)

Dot product is commutative, $\textbf{u} \cdot \textbf{v} = \textbf{v} \cdot \textbf{u}$:

In [30]:
y.dot(x)

tensor(11.)

### Matrix Multiplication

A 2D Example:

$\begin{bmatrix}
  10 & 20 & 30 \\
  40 & 50 & 60
\end{bmatrix} 
\begin{bmatrix}
  2 & 3 & 5 & 7 \\
  11 & 13 & 17 & 19 \\
  23 & 29 & 31 & 37
\end{bmatrix} = 
\begin{bmatrix}
  930 & 1160 & 1320 & 1560 \\
  2010 & 2510 & 2910 & 3450
\end{bmatrix}$

A matrix $Q$ of size $m \times n$ can be multiplied by a matrix $R$ of size $n \times q$. The result $P$ is an $m \times q$ matrix where each element is computed as a sum of products:

$P_{i,j} = \sum_{k=1}^n{Q_{i,k} \times R_{k,j}}$

Each element $P_{i,j}$ is the dot product of the row vector $Q_{i,*}$ and the column vector $R_{*,j}$ (!) :

$P_{i,j} = Q_{i,*} \cdot R_{*,j}$

###### Questions
1. If we were to define the two matrices in the example above in PyTorch, What would be their shapes?
2. Based on this definition, can you write down the calculation that yields `930` in the example above?

In `PyTorch`, matrix multiplication is implemented using `torch.matmul`. Let's look at an example:

In [31]:
x = torch.tensor([[1., 2.],[3., 4.]])
y = torch.tensor([[5., 6.],[7., 8.]])
print(x)
print(y)

tensor([[1., 2.],
        [3., 4.]])
tensor([[5., 6.],
        [7., 8.]])


In [32]:
print(torch.matmul(x, y))

tensor([[19., 22.],
        [43., 50.]])


In [33]:
x.matmul(y)

tensor([[19., 22.],
        [43., 50.]])

Python 3.5 [introduced](https://docs.python.org/3/whatsnew/3.5.html#pep-465-a-dedicated-infix-operator-for-matrix-multiplication) the `@` ('At-sign') operator for matrix multiplication. In PyTorch (and Tensorflow) `@` is an alias of `matmul`.

In [34]:
x@y

tensor([[19., 22.],
        [43., 50.]])

#### `torch.matmul` on tensors with rank!=2
`torch.matmul` extends the 2D matrix multiplication to lower and higher dimensions. Its [docstring](https://pytorch.org/docs/stable/generated/torch.matmul.html) is very useful to see how the 2D definition is extended into lower or higher dimensions. In other words, the mathematical operation depends on the rank of the tensors.

##### 1-D tensors
According to the `troch.matmul` [docstring](https://pytorch.org/docs/stable/generated/torch.matmul.html):
>If both tensors are 1-dimensional, the dot product (scalar) is returned.

Example:

In [35]:
x = torch.tensor([1., 2.])
y = torch.tensor([3., 4.])
print(f'x: {x}')
print(f'y: {y}')

x: tensor([1., 2.])
y: tensor([3., 4.])


In [36]:
torch.matmul(x, y)

tensor(11.)

##### matmul on 1D and 2D ranked tensors
From the docstring:
>If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its _dimension_ for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.

_prepended: add something to the beginning of something else_

An example can be useful:

$\begin{bmatrix}
  1 \\ 2 \\
\end{bmatrix} 
\begin{bmatrix}
  3 & 4 \\
  5 & 6 \\
\end{bmatrix} = 
\begin{bmatrix}
  13 \\ 16 \\
\end{bmatrix}$

In [37]:
x = torch.tensor([1., 2.])
y = torch.tensor([[3., 4.],[5., 6.]])
print(x)
print(y)
print(torch.matmul(x,y))

tensor([1., 2.])
tensor([[3., 4.],
        [5., 6.]])
tensor([13., 16.])


From the docstring:
>If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.

$ \begin{bmatrix}
  3 & 4 \\
  5 & 6 \\
\end{bmatrix} 
\begin{bmatrix}
  1 \\ 2 \\
\end{bmatrix}= 
\begin{bmatrix}
  11 \\ 17 \\
\end{bmatrix}$

In [38]:
x = torch.tensor([1., 2.])
y = torch.tensor([[3., 4.],[5., 6.]])
print(x)
print(y)

tensor([1., 2.])
tensor([[3., 4.],
        [5., 6.]])


In [39]:
torch.matmul(y,x)

tensor([11., 17.])

#### matmul when at least one tensor's rank is > 2
In this case a batched matrix multiply is returned, using [boardcasting](https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics). See the matmul [docstring](https://pytorch.org/docs/stable/generated/torch.matmul.html) for examples. We will go over broadcasting in the Tensorflow's tensors tutorial. 