## Retele de perceptroni - Pytorch & Scikit Learn

### Definirea unei retele de perceptroni in Scikit-learn

In [11]:
from sklearn.neural_network import MLPClassifier # importul clasei


mlp_classifier_model = MLPClassifier(hidden_layer_sizes=(100, ),
activation='relu', solver='adam', alpha=0.0001, batch_size='auto',
learning_rate='constant', learning_rate_init=0.001, power_t=0.5,
max_iter=200, shuffle=True, random_state=None, tol=0.0001,
momentum=0.9, early_stopping=False, validation_fraction=0.1,
n_iter_no_change=10)

Parametrii:
- hidden_layer_sizes (tuple, lungime= n_layers - 2, default=(100,)): al i-lea
element reprezinta numarul de neurori din al i-lea strat ascuns.
- activation( {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=‘relu’)
- ‘Identity’: 𝑓(𝑥) = 𝑥
- ‘logistic’ : 𝑓(𝑥) = 1
1 + ϵ−𝑥
- ‘tanh’ : 𝑓(𝑥) = 𝑡𝑎𝑛ℎ(𝑥)
- ‘relu’ : 𝑓(𝑥) = 𝑚𝑎𝑥(0, 𝑥)
- solver ( {‘lbfgs’, ‘sgd’, ‘adam’}, default=‘adam’): regula de invatare (update)
- ‘sgd’ - stochastic gradient descent (doar pe acesta il vom folosi).
- batch_size: (int, default=‘auto’)
- auto - marimea batch-ului pentru antrenare este min(200, n_samples).
- learning_rate_init (double, default=0.001): rata de invatare
- max_iter (int, default=200): numarul maxim de epoci pentru antrenare.
- shuffle (bool, default=True): amesteca datele la fiecare epoca
- tol (float, default=1e-4) :
- Daca eroarea sau scorul nu se imbunatatesc timp n_iter_no_chage
epoci consecutive (si learning_rate != ‘adaptive’) cu cel putin tol,
antrenarea se opreste.
- n_iter_no_change : (int, optional, default 10, sklearn-versiune-0.20)
- Numarul maxim de epoci fara imbunatatiri (eroare sau scor).
- alpha (float, default=0.0001): parametru pentru regularizare L2.
- learning_rate ( {‘constant’, ‘invscaling’, ‘adaptive’}, default=‘constant’ ):
- ‘constant’ : rata de invatare este constanta si este data de parametrul
learning_rate_init.
- ‘invscaling’: rata de invatare va fi scazuta la fiecare pas t, dupa
formula: new_learning_rate = learning_rate_init / pow(t, power_t)
- ‘adaptive’: pastreaza rata de invatare constanta cat timp eroarea
scade. Daca eroarea nu scade cu cel putin tol (fata de epoca anterior)
sau daca scorul pe multimea de validare (doar daca
ealy_stopping=True) nu creste cu cel putin tol (fata de epoca
anteriora), rata de invatare curenta se imparte la 5.
- power_t (double, default=0.5): parametrul pentru learning_rate=’invscaling’.
- momentum (float, default=0.9): - valoarea pentru momentum cand se
foloseste gradient descent cu momentum. Trebuie sa fie intre 0 si 1.
- early_stopping (bool, default=False):
- Daca este setat cu True atunci antrenarea se va termina daca eroarea
pe multimea de validare nu se imbunatateste timp n_iter_no_chage
epoci consecutive cu cel putin tol.
- validation_fraction (float, optional, default=0.1):
- Procentul din multimea de antrenare care sa fie folosit pentru validare
(doar cand early_stopping=True). Trebuie sa fie intre 0 si 1.

Mai departe in restul laboratorului ne vom focusa pe implementara retelelor neuronale folosind libraria Pytorch

### Install Pytorch


Accesati linkul: https://pytorch.org, iar la sectiunea "Install Pytorch" selectati detaliile conform specificatiilor masinii voastre. Mai precis, daca masina dispune de o placa video atunci lasati selectia nemodificata, in caz contrar selectati CPU in campul "Compute Platform".

Exemplu configuratie masina cu GPU:

![pytorch_gpu.png](./assets/pytorch_gpu.png)

Exemplu configuratie masina doar cu CPU:

![pytorch_cpu.png](./assets/pytorch_cpu.png)

Pentru a verifica daca instalarea a fost cu succes, puteti rula urmatorul bloc de cod:


In [2]:
import torch
x = torch.rand(5, 3)
print(x)

tensor([[0.1884, 0.4039, 0.3352],
        [0.8100, 0.5215, 0.1551],
        [0.1485, 0.1562, 0.8069],
        [0.4286, 0.6763, 0.8859],
        [0.1566, 0.0437, 0.3820]])


Pentru a verifica daca GPU-ul este accesibil de catre Pytorch, puteti rula codul urmator. Daca totul este in regula, ultima linie ar trebui sa returneze True.

In [1]:
import torch
torch.cuda.is_available()

True

### Definirea retelei neuronale

Pentru a crea un model in Pytorch este necesar sa extindem clasa **nn.Module**, iar in constructor vom defini straturile retelei care vor fi folosite in implementarea functiei **forward**. Mai jos aveti un exemplu pentru un Multilayer Perceptron cu un singur strat ascuns.

- stratul **Flatten** transforma datele de intrare in vectori 1-dimensionali.
- stratul **Linear** aplica o transformare liniara: xW<sup>T</sup>+b. Pentru acest strat trebuie sa specificam dimensiunile matricei W, care corespund cu dimensiunea tensorilor de intrare si iesire.

In [3]:
import torch.nn as nn
import torch.nn.functional as F

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.first_layer = nn.Linear(28*28, 512)
        self.second_layer = nn.Linear(512, 512)
        self.output_layer = nn.Linear(512, 10)
    

    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.first_layer(x))
        x = F.relu(self.second_layer(x))
        x = self.output_layer(x)
        return x

Trecerea unui exemplu prin reteaua precedenta se poate executa in felul urmator:

In [4]:
model = NeuralNetwork()
model(torch.rand(5, 1, 28, 28))

tensor([[-0.0728, -0.0668, -0.0012, -0.0073,  0.0726,  0.0236, -0.0149,  0.0356,
         -0.0290,  0.0190],
        [-0.0376, -0.0426,  0.0368, -0.0082,  0.0689,  0.0349, -0.0054,  0.0302,
          0.0019, -0.0152],
        [-0.0588, -0.0473,  0.0076,  0.0155,  0.0640,  0.0223, -0.0632,  0.0367,
          0.0202,  0.0437],
        [-0.0643, -0.0471,  0.0055,  0.0272,  0.0909,  0.0586,  0.0085,  0.0060,
          0.0109,  0.0346],
        [-0.0514, -0.0213,  0.0215,  0.0318,  0.0431,  0.0424, -0.0031,  0.0499,
         -0.0246,  0.0240]], grad_fn=<AddmmBackward0>)

### Antrenarea retelei

Pentru antrenarea retelei avem nevoie de date de antrenare, un algoritm de optimizare si o functie de pierdere pe care sa o minimizam pe setul de antrenare.

Vom folosi MNIST pentru a ilustra o procedura de antrenare in Pytorch, ca algoritm de optimizare vom folosi stochastic gradient descent (SGD), iar functia de optimizare va fi cross entropy.


Crearea seturilor de date si a dataloader-lor care ne vor ajuta sa iteram prin batch-uri in timpul unei epoci:

In [6]:
from torchvision import datasets 
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

train_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(train_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data\MNIST\raw\train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data\MNIST\raw\train-images-idx3-ubyte.gz to data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data\MNIST\raw\train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data\MNIST\raw\train-labels-idx1-ubyte.gz to data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data\MNIST\raw\t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data\MNIST\raw\t10k-images-idx3-ubyte.gz to data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data\MNIST\raw\t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data\MNIST\raw\t10k-labels-idx1-ubyte.gz to data\MNIST\raw



Crearea modelului si definirea algoritmului de optimizare:

In [7]:
model = NeuralNetwork()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)

Antrenarea retelei :

In [9]:
NUM_EPOCHS=10
device = "cuda" if torch.cuda.is_available() else "cpu" # decidem device-ul pe care sa il folosim cpu/cuda(gpu)
model = model.to(device) 
loss_function = nn.CrossEntropyLoss() # functia ce trebuie optimizata, cross entropia

model.train(True)
for i in range(NUM_EPOCHS):
    print(f"=== Epoch {i+1} ===")
    for batch, (image_batch, labels_batch) in enumerate(train_dataloader): # iteram prin batch-uri
        image_batch = image_batch.to(device)
        labels_batch = labels_batch.to(device)
        
        pred = model(image_batch) # procesam imaginile prin retea
        loss = loss_function(pred, labels_batch) # determinam functia de pieredere folosind rezultatele retelei
                                                 # si label-urile reale ale exemplelor de antrenare
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward() # backpropagation
        optimizer.step() # optimizam parametrii retelei
        
        if batch % 100 == 0:
                loss = loss.item()
                print(f"Batch index {batch }, loss: {loss:>7f}")

=== Epoch 1 ===
Batch index 0, loss: 2.037801
Batch index 100, loss: 1.764261
Batch index 200, loss: 1.596179
Batch index 300, loss: 1.110204
Batch index 400, loss: 0.928878
Batch index 500, loss: 0.762543
Batch index 600, loss: 0.606124
Batch index 700, loss: 0.733767
Batch index 800, loss: 0.623447
Batch index 900, loss: 0.554930
=== Epoch 2 ===
Batch index 0, loss: 0.595891
Batch index 100, loss: 0.421773
Batch index 200, loss: 0.449756
Batch index 300, loss: 0.473317
Batch index 400, loss: 0.400112
Batch index 500, loss: 0.419373
Batch index 600, loss: 0.277204
Batch index 700, loss: 0.492012
Batch index 800, loss: 0.432486
Batch index 900, loss: 0.462994
=== Epoch 3 ===
Batch index 0, loss: 0.395996
Batch index 100, loss: 0.300016
Batch index 200, loss: 0.303191
Batch index 300, loss: 0.402809
Batch index 400, loss: 0.307577
Batch index 500, loss: 0.363944
Batch index 600, loss: 0.212678
Batch index 700, loss: 0.426307
Batch index 800, loss: 0.376609
Batch index 900, loss: 0.43333

Testarea performantei:

In [10]:
correct = 0.
test_loss = 0.
size = len(test_dataloader.dataset)
model.to(device)
model.eval()
with torch.no_grad():
        for image_batch, labels_batch in test_dataloader: # iteram prin datele de test
            
            image_batch = image_batch.to(device)
            labels_batch = labels_batch.to(device)
            pred = model(image_batch) # procesam imaginile folosind reteaua antrenata anterior
            test_loss += loss_function(pred, labels_batch).item()
            correct += (pred.argmax(1) == labels_batch).type(torch.float).sum().item() # numaram cate exemple sunt corect clasificate


correct /= size
test_loss /= size
print(f"Accuracy: {(100*correct):>0.1f}%, Loss: {test_loss:>8f} \n")

Accuracy: 94.1%, Loss: 0.003190 



### Exercitii

1. Antrenati o retea de perceptroni care sa clasifice cifrele scrise de mana MNIST. Datele trebuie normalizate prin scaderea mediei si impartirea la deviatia standard. Antrenati pentru 5 epoci si testati urmatoarele configuratii de retele:

a. Definiti o retea cu un singur strat ascuns cu un singur neuron si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-2.

b. Definiti o retea cu un singur strat ascuns cu 10 neuroni si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-2.

c. Definiti o retea cu un singur strat ascuns cu 10 neuroni si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-5.

d. Definiti o retea cu un singur strat ascuns cu 10 neuroni si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 10.

e. Definiti o retea cu 2 straturi ascunse cu 10 neuroni fiecare si folositi ca functie de activare tanh. Pentru optimizator folositi un learning rate de 1e-2.

f. Definiti o retea cu 2 straturi ascunse cu 10 neuroni fiecare si folositi ca functie de activare relu. Pentru optimizator folositi un learning rate de 1e-2.

g. Definiti o retea cu 2 straturi ascunse cu 100 neuroni fiecare si folositi ca functie de activare relu. Pentru optimizator folositi un learning rate de 1e-2.

h. Definiti o retea cu 2 straturi ascunse cu 100 neuroni fiecare si folositi ca functie de activare relu. Pentru optimizator folositi un learning rate de 1e-2 si momentum=0.9