<a href="https://colab.research.google.com/github/victorm0202/temas_selectos_CD-23/blob/main/Intro_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Infraestructuras de cómputo para Deep Learning

Un proyecto de ML-DL: el panorama general.

![landscape](https://drive.google.com/uc?id=1fnqj274aECr7z7XbsLoAWTHbnfdix9ir)

(E. Stevens et al., "Deep learning with Pytorch")


## Hardware.

CPU (central process unit), GPU (graphic process unit), TPU (tensor process unit).

![hardware](https://drive.google.com/uc?id=1OnlzBkQrNXC0_7uwLQ5Dia4EIp4d310a)



## Software.

Actualmente, hay software especializado en manejo y operaciones con estructuras de datos apropiadas para redes neuronales profundas. Algunas tienen separado backend y frontend. Keras, por ejemplo, es una API que puede ejecutarse con ciertos backends, como tensorflow, theano, entre otros.

![software](https://drive.google.com/uc?id=1avbztqDEjSEiAaS1ZiMWqWitYS2-CyBc)

Ver también [Comparación de software para DL](https://en.wikipedia.org/wiki/Comparison_of_deep-learning_software#Comparison_of_compatibility_of_machine_learning_models).

## Infraestructuras de software



### Tensorflow

![tf1](https://drive.google.com/uc?id=1mEs8ic5eo0aIjk5h3JgWRXauIVH4NPGU)

![tf2](https://drive.google.com/uc?id=1a3hyNNdWbpjXSeQwjql3wcMEXxNqRTdf)


### Pytorch

![pytorch1](https://drive.google.com/uc?id=1PjsCNf4pvI-WNk9F_Q_e4T8HB4O8Qy91)

- Escrito en C++ y CUDA
- API en python
- Dos elementos principales:
  1.   Tensors
  2.   Autograd






# Introducción a Pytorch

## Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices.
In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

![tensors1](https://drive.google.com/uc?id=1SjbnWUbLg7ssJojwlZUnNt-97UHwUcJY)
![tensors2](https://drive.google.com/uc?id=1-_slE61Afr2aqn12GIV9fkQ7UzXaJk0N)
![tensors3](https://drive.google.com/uc?id=1fM3jRALD4IgACg-HOkL8AsCcgY3n5U73)




Numpy tensors:

In [None]:
import numpy as np
x = np.array(12)
x.ndim

0

In [None]:
x = np.array([12, 3, 6, 14])
print('X:',x)
print('ndim:',x.ndim)

X: [12  3  6 14]
ndim: 1


In [None]:
x = np.array([[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]])
print('X:',x)
print('ndim:',x.ndim)

X: [[ 5 78  2 34  0]
 [ 6 79  3 35  1]
 [ 7 80  4 36  2]]
ndim: 2


In [None]:
x = np.array([[[5, 78, 2, 34, 0],
                   [5, 79, 3, 35, 1],
                   [5, 80, 4, 36, 2]],
                  [[5, 78, 2, 34, 0],
                   [5, 79, 3, 35, 1],
                   [5, 80, 4, 36, 2]],
                  [[5, 78, 2, 34, 0],
                   [5, 79, 3, 35, 1],
                   [5, 80, 4, 36, 2]]])
print('X:',x)
print('ndim:',x.ndim)

X: [[[ 5 78  2 34  0]
  [ 5 79  3 35  1]
  [ 5 80  4 36  2]]

 [[ 5 78  2 34  0]
  [ 5 79  3 35  1]
  [ 5 80  4 36  2]]

 [[ 5 78  2 34  0]
  [ 5 79  3 35  1]
  [ 5 80  4 36  2]]]
ndim: 3


Tensors are similar to `NumPy’s` [numpy](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and
NumPy arrays can often share the same underlying memory, eliminating the need to copy data (see `bridge-to-np-label`). Tensors
are also optimized for automatic differentiation.

In [None]:
%matplotlib inline

In [None]:
import torch
import numpy as np

![landscape](https://drive.google.com/uc?id=1I41oqZQK3CoxbF8ZAsowMnpANAapNUWt)

(E. Stevens et al., "Deep learning with Pytorch")


### Initializing a Tensor


Tensors can be initialized in various ways. Take a look at the following examples:

**Directly from data**

Tensors can be created directly from data. The data type is automatically inferred.



In [None]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
x_data

tensor([[1, 2],
        [3, 4]])

**From a NumPy array**

Tensors can be created from NumPy arrays (and vice versa - see `bridge-to-np-label`).



In [None]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

**From another tensor:**

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.



In [None]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.8556, 0.0797],
        [0.1023, 0.2043]]) 



**With random or constant values:**

``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.



In [None]:
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.3867, 0.4173, 0.8715],
        [0.4580, 0.9357, 0.8788]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


--------------




### Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.



In [None]:
tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


--------------




### Operations on Tensors

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing,
indexing, slicing), sampling and more are
comprehensively described `here` <https://pytorch.org/docs/stable/torch.html>.

Each of these operations can be run on the GPU (at typically higher speeds than on a
CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using
``.to`` method (after checking for GPU availability). Keep in mind that copying large tensors
across devices can be expensive in terms of time and memory!



In [None]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

In [None]:
tensor.device

device(type='cuda', index=0)

Try out some of the operations from the list.
If you're familiar with the NumPy API, you'll find the Tensor API a breeze to use.




**Standard numpy-like indexing and slicing:**



In [None]:
tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)

First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


**Joining tensors** You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension.
See also [torch.stack](https://pytorch.org/docs/stable/generated/torch.stack.html),
another tensor joining op that is subtly different from ``torch.cat``.



In [None]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


**Arithmetic operations**



In [None]:
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])

In [None]:

# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

**Single-element tensors** If you have a one-element tensor, for example by aggregating all
values of a tensor into one value, you can convert it to a Python
numerical value using ``item()``:



In [None]:
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

12.0 <class 'float'>


**In-place operations**
Operations that store the result into the operand are called in-place. They are denoted by a ``_`` suffix.
For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.



In [None]:
print(f"{tensor} \n")
tensor.add_(5)
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])


<div class="alert alert-info"><h4>Note</h4><p>In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss
     of history. Hence, their use is discouraged.</p></div>



--------------




### Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory
locations, and changing one will change	the other.



#### Tensor to NumPy array

In [None]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


A change in the tensor reflects in the NumPy array.



In [None]:
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


#### NumPy array to Tensor

In [None]:
n = np.ones(5)
t = torch.from_numpy(n)

Changes in the NumPy array reflects in the tensor.



In [None]:
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

### Tensors: scenic views of storage

Values of tensors are allocated in contiguous chuncks of memory managed by torch.Storage

![landscape](https://drive.google.com/uc?id=1ACNU_409P8QRf7u-jH3AyEgzaYhNPQvJ)

(E. Stevens et al., "Deep learning with Pytorch")


## Autograd

Pytorch is well-known for its automatic differentiation feature. We can call the `backward()` method to ask `PyTorch` to calculate the gradients, which are then stored in the `grad` attribute.

In [None]:
import torch
import numpy as np

# Import pprint, module we use for making our print statements prettier
import pprint
pp = pprint.PrettyPrinter()

In [None]:
# requires_grad parameter in the constructor, tells PyTorch to store gradients
x = torch.tensor([2.], requires_grad=True)

# Print the gradient if it is calculated
# Currently None since x is a scalar
pp.pprint(x.grad)

None


In [None]:
# Calculating the gradient of y with respect to x
y = x * x * 3 # 3x^2
y.backward()
pp.pprint(x.grad) # d(y)/d(x)

tensor([12.])


**Note**: Calling backward will lead derivatives to accumulate at leaf nodes.
We need to zero the gradient explicitly after using it for parameter updates.

In [None]:
z = x * x * 3 # 3x^2
z.backward()
pp.pprint(x.grad)

tensor([24.])


We can see that the x.grad is updated to be the sum of the gradients calculated so far. When we run backprop in a neural network, we sum up all the gradients for a particular neuron before making an update. This is exactly what is happening here! This is also the reason why we need to run zero_grad() in every training iteration (more on this later). Otherwise our gradients would keep building up from one training iteration to the other, which would cause our updates to be wrong.

```
if params.grad is not None:
    params.grad.zero_()
```



Veamos otro ejemplo

In [None]:
t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0,
                    3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9,
                    33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u

In [None]:
def model(t_u, w, b):
    return w * t_u + b

def loss_fn(t_p, t_c):
    squared_diffs = (t_p - t_c)**2
    return squared_diffs.mean()

In [None]:
params = torch.tensor([1.0, 0.0], requires_grad=True)

In [None]:
params.grad is None

True

In [None]:
loss = loss_fn(model(t_u, *params), t_c)
loss.backward()

params.grad

tensor([4517.2969,   82.6000])

Values of tensors are allocated in contiguous chuncks of memory managed by torch.Storage

![landscape](https://drive.google.com/uc?id=1PesFXbPY1BUuqXcIUPUu5MRSF027txvT)

(E. Stevens et al., "Deep learning with Pytorch")

In [None]:
def training_loop(n_epochs, learning_rate, params, t_u, t_c):
    for epoch in range(1, n_epochs + 1):
        if params.grad is not None:  # <1>
            params.grad.zero_()

        t_p = model(t_u, *params)
        loss = loss_fn(t_p, t_c)
        loss.backward()

        with torch.no_grad():  # <2>
            params -= learning_rate * params.grad

        if epoch % 500 == 0:
            print('Epoch %d, Loss %f' % (epoch, float(loss)))

    return params

In [None]:
training_loop(
    n_epochs = 5000,
    learning_rate = 1e-2,
    params = torch.tensor([1.0, 0.0], requires_grad=True), # <1>
    t_u = t_un, # <2>
    t_c = t_c)

Epoch 500, Loss 7.860115
Epoch 1000, Loss 3.828538
Epoch 1500, Loss 3.092191
Epoch 2000, Loss 2.957698
Epoch 2500, Loss 2.933134
Epoch 3000, Loss 2.928648
Epoch 3500, Loss 2.927830
Epoch 4000, Loss 2.927679
Epoch 4500, Loss 2.927652
Epoch 5000, Loss 2.927647


tensor([  5.3671, -17.3012], requires_grad=True)