<a href="https://colab.research.google.com/github/unicamp-dl/IA025_2022S1/blob/main/ex05/Marcus_Vinicius_Borela_de_Castro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
nome = 'Marcus Vinícius Borela de CAstro'

print(f'Meu nome é {nome}')

Meu nome é Marcus Vinícius Borela de CAstro


Este exercicío consiste em treinar no MNIST um modelo de duas camadas, sendo a primeira uma camada convolucional e a segunda uma camada linear de classificação.

Não podemos usar as funções torch.nn.Conv{1,2,3}d

## Importação das bibliotecas

In [2]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import random
import torch
import torchvision
from torchvision.datasets import MNIST

## Fixando as seeds

In [3]:
def inicializa_seed(num_semente:int=123):
  """
  É recomendado reiniciar as seeds antes de inicializar o modelo, pois assim
  garantimos que os pesos vao ser sempre os mesmos.
  fontes de apoio: 
      http://nlp.seas.harvard.edu/2018/04/03/attention.html
      https://github.com/CyberZHG/torch-multi-head-attention/blob/master/torch_multi_head_attention/multi_head_attention.py#L15
  """
  random.seed(num_semente)
  np.random.seed(num_semente)
  torch.manual_seed(num_semente)
  #torch.cuda.manual_seed(num_semente)
  #Cuda algorithms
  #torch.backends.cudnn.deterministic = True 

In [4]:
inicializa_seed(123)

## Define pesos iniciais

In [5]:
in_channels = 1
out_channels = 2
kernel_size = 5
stride = 3

# Input image size
height_in = 28  
width_in = 28

# Image size after the first convolutional layer.
height_out = (height_in - kernel_size - 1) // stride + 1
width_out = (width_in - kernel_size - 1) // stride + 1


initial_conv_weight = torch.FloatTensor(out_channels, in_channels, kernel_size, kernel_size).uniform_(-0.01, 0.01)
initial_conv_bias = torch.FloatTensor(out_channels,).uniform_(-0.01, 0.01)

initial_classification_weight = torch.FloatTensor(10, out_channels * height_out * width_out).uniform_(-0.01, 0.01)
initial_classification_bias = torch.FloatTensor(10,).uniform_(-0.01, 0.01)

In [6]:
print(f" height_out {height_out}, width_out {width_out}")

 height_out 8, width_out 8


## Dataset e dataloader

### Definição do tamanho do minibatch

In [7]:
batch_size = 50

### Carregamento, criação dataset e do dataloader

In [8]:
dataset_dir = '../data/'

dataset_train_full = MNIST(dataset_dir, train=True, download=True,
                           transform=torchvision.transforms.ToTensor())
print(dataset_train_full.data.shape)
print(dataset_train_full.targets.shape)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw

torch.Size([60000, 28, 28])
torch.Size([60000])


### Usando apenas 1000 amostras do MNIST

Neste exercício utilizaremos 1000 amostras de treinamento.

In [9]:
indices = torch.randperm(len(dataset_train_full))[:1000]
dataset_train = torch.utils.data.Subset(dataset_train_full, indices)

## Define os pesos iniciais

In [10]:
loader_train = torch.utils.data.DataLoader(dataset_train, batch_size=batch_size, shuffle=False)

In [11]:
print('Número de minibatches de trenamento:', len(loader_train))


Número de minibatches de trenamento: 20


In [12]:
x_train, y_train = next(iter(loader_train))
print("\nDimensões dos dados de um minibatch:", x_train.size())
print("Valores mínimo e máximo dos pixels: ", torch.min(x_train), torch.max(x_train))
print("Tipo dos dados das imagens:         ", type(x_train))
print("Tipo das classes das imagens:       ", type(y_train))


Dimensões dos dados de um minibatch: torch.Size([50, 1, 28, 28])
Valores mínimo e máximo dos pixels:  tensor(0.) tensor(1.)
Tipo dos dados das imagens:          <class 'torch.Tensor'>
Tipo das classes das imagens:        <class 'torch.Tensor'>


In [13]:
x_train.shape, y_train.shape

(torch.Size([50, 1, 28, 28]), torch.Size([50]))

In [14]:
x_train[0, 0, 0, ] # 1a linha da 1a amostra

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.])

In [15]:
x_train[0, 0,] # 28 linhas da 1a amostra

tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000

In [16]:
x_train.shape[1] # canais

1

## Camada Convolucional

In [17]:
torch.zeros((4,1,4,5),  dtype=torch.float, requires_grad=True)

tensor([[[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]]], requires_grad=True)

In [18]:
saida = torch.empty((4,1,4,5),  dtype=torch.float, requires_grad=True)

In [19]:
saida.shape

torch.Size([4, 1, 4, 5])

In [20]:
saida[0,0,0]

tensor([1.0943e-35, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
       grad_fn=<SelectBackward0>)

In [21]:
# torch.cat((saida[0,0,0], torch.tensor([[12]])), dim=-1)

In [22]:
# saida

In [57]:
class MyConv2d(torch.nn.Module):
  def __init__(self, in_channels: int, out_channels: int, kernel_size: int, stride: int):
    super(MyConv2d, self).__init__()

    self.in_channels = in_channels
    self.out_channels = out_channels
    self.kernel_size = kernel_size  # The same for height and width.
    self.stride = stride  # The same for height and width.
    self.weight = torch.nn.Parameter(torch.FloatTensor(out_channels, in_channels, kernel_size, kernel_size).uniform_(-0.01, 0.01))
    self.bias = torch.nn.Parameter(torch.FloatTensor(out_channels,).uniform_(-0.01, 0.01))
    print(f"Inicializado MyConv2d")
    print(f"in_channels: {self.in_channels} ")
    print(f"out_channels: {self.out_channels} ")
    print(f"kernel_size: {self.kernel_size} ")
    print(f"stride: {self.stride} ")
    print(f"weight.shape: {self.weight.shape} ")
    print(f"weight: {self.weight} ")
    print(f"bias.shape: {self.bias.shape} ")
    print(f"bias: {self.bias} ")

  def forward(self, x):
    assert x.dim() == 4, f'x must have 4 dimensions, not {x.shape}'
    assert x.shape[1] == 1, f'x must have only 1 channel, not {x.shape[1]}' # Num_canais sempre 1 (mnist, preto/branco)

    # print(f"kernel.shape: {self.weight.shape}, kernel: {self.weight}")
    # Escreva seu código aqui.
    # versão com for nas dimensões de X
    num_amostras = x.shape[0]
    num_linhas_entrada = x.shape[2]
    num_colunas_entrada = x.shape[3]
    num_linhas_saida = (num_linhas_entrada - self.kernel_size) // self.stride + 1
    num_colunas_saida = (num_colunas_entrada - self.kernel_size) // self.stride + 1
    print(f" num_amostras: {num_amostras}, self.out_channels: {self.out_channels}, num_linhas_entrada: {num_linhas_entrada}, num_colunas_entrada: {num_colunas_entrada}, num_linhas_saida: {num_linhas_saida}, num_colunas_saida: {num_colunas_saida}")
    saida = torch.zeros((num_amostras,self.out_channels,num_linhas_saida,num_colunas_saida), dtype=torch.float, requires_grad=False)        
    print(f"saida.shape: {saida.shape}")
    for ndx_amostra in range(num_amostras):
      print(f"\nndx_amostra: {ndx_amostra}")
      # for ndx_out_channels in range(self.out_channels):
      #   print(f"\nndx_out_channels: {ndx_out_channels}")
      for ndx_in_channels in range(self.in_channels):
        print(f"\nndx_in_channels: {ndx_in_channels}")
        ndx_linhas_entrada = 0
        for ndx_linhas_saida in range(num_linhas_saida):
          ndx_colunas_entrada = 0
          for ndx_colunas_saida in range(num_colunas_saida):
            print(f"\nndx_linhas_saida, ndx_colunas_saida: {ndx_linhas_saida}, {ndx_colunas_saida}")
            print(f" alvo do kernel em x: x[{ndx_amostra},{ndx_in_channels},{ndx_linhas_entrada}:{ndx_linhas_entrada+self.kernel_size}, {ndx_colunas_entrada}:{ndx_colunas_entrada+self.kernel_size}]")
            print(f" \n {x[ndx_amostra, ndx_in_channels, ndx_linhas_entrada:ndx_linhas_entrada+self.kernel_size, ndx_colunas_entrada:ndx_colunas_entrada+self.kernel_size]}")
            produto = torch.mul(x[ndx_amostra, ndx_in_channels, ndx_linhas_entrada:ndx_linhas_entrada+self.kernel_size, ndx_colunas_entrada:ndx_colunas_entrada+self.kernel_size], self.weight)
            print(f" produto: {produto}")
            soma = torch.sum(produto, dim=(2,3), keepdim=True )
            print(f" soma: {soma}")
            valor_soma = soma.squeeze()
            print(f" valor_soma: {valor_soma}")
            # saida = torch.cat((saida, soma))
            if self.out_channels > 1:  # soma é um com dimensões, como em torch.tensor([[[[ 46.]], [[134.]]]])
              for ndx_out_channels in range(self.out_channels):
                saida[ndx_amostra, ndx_out_channels, ndx_linhas_saida, ndx_colunas_saida] += valor_soma[ndx_out_channels]
                print(f" somado na saída em [{ndx_amostra}, {ndx_out_channels}, {ndx_linhas_saida}, {ndx_colunas_saida}] = {saida[ndx_amostra, ndx_out_channels, ndx_linhas_saida, ndx_colunas_saida]}")
            else: # soma é um tensor escalar, como em tensor(34.)
                saida[ndx_amostra, 0, ndx_linhas_saida, ndx_colunas_saida] += valor_soma
                print(f" somado na saída em [{ndx_amostra}, 0, {ndx_linhas_saida}, {ndx_colunas_saida}] = {saida[ndx_amostra, 0, ndx_linhas_saida, ndx_colunas_saida]}")
            ndx_colunas_entrada += self.stride
          ndx_linhas_entrada += self.stride
    print(f" saida: {saida}")
    # somando bias
    for ndx_amostra in range(num_amostras):
      print(f"\nndx_amostra: {ndx_amostra}")
      for ndx_out_channels in range(self.out_channels):
        saida[ndx_amostra, ndx_out_channels] += self.bias[ndx_out_channels]
    print(f" saida apos somar bias: {saida}")
    # versão com for no kernel

    return saida

## Compare se sua implementação está igual à do pytorch usando um exemplo simples

In [58]:
in_channels_dummy = 1
out_channels_dummy = 1
kernel_size_dummy = 2
stride_dummy = 1
x = torch.arange(30).float().reshape(1, 1, 5, 6)

In [59]:
print(x)

tensor([[[[ 0.,  1.,  2.,  3.,  4.,  5.],
          [ 6.,  7.,  8.,  9., 10., 11.],
          [12., 13., 14., 15., 16., 17.],
          [18., 19., 20., 21., 22., 23.],
          [24., 25., 26., 27., 28., 29.]]]])


In [60]:

conv_layer = MyConv2d( out_channels=out_channels_dummy, in_channels=in_channels_dummy, kernel_size=kernel_size_dummy, stride=stride_dummy)

# Usa os mesmos pesos para minha implementação e a do pytorch
initial_weights_dummy = torch.arange(in_channels_dummy * out_channels_dummy * kernel_size_dummy * kernel_size_dummy).float()
initial_weights_dummy = initial_weights_dummy.reshape(in_channels_dummy, out_channels_dummy, kernel_size_dummy, kernel_size_dummy)
initial_bias_dummy = torch.arange(out_channels_dummy,).float()

conv_layer.weight.data = initial_weights_dummy
conv_layer.bias.data = initial_bias_dummy

out = conv_layer(x)


Inicializado MyConv2d
in_channels: 1 
out_channels: 1 
kernel_size: 2 
stride: 1 
weight.shape: torch.Size([1, 1, 2, 2]) 
weight: Parameter containing:
tensor([[[[-0.0053,  0.0004],
          [ 0.0034,  0.0005]]]], requires_grad=True) 
bias.shape: torch.Size([1]) 
bias: Parameter containing:
tensor([0.0042], requires_grad=True) 
 num_amostras: 1, self.out_channels: 1, num_linhas_entrada: 5, num_colunas_entrada: 6, num_linhas_saida: 4, num_colunas_saida: 5
saida.shape: torch.Size([1, 1, 4, 5])

ndx_amostra: 0

ndx_in_channels: 0

ndx_linhas_saida, ndx_colunas_saida: 0, 0
 alvo do kernel em x: x[0,0,0:2, 0:2]
 
 tensor([[0., 1.],
        [6., 7.]])
 produto: tensor([[[[ 0.,  1.],
          [12., 21.]]]], grad_fn=<MulBackward0>)
 soma: tensor([[[[34.]]]], grad_fn=<SumBackward1>)
 valor_soma: 34.0
 somado na saída em [0, 0, 0, 0] = 34.0

ndx_linhas_saida, ndx_colunas_saida: 0, 1
 alvo do kernel em x: x[0,0,0:2, 1:3]
 
 tensor([[1., 2.],
        [7., 8.]])
 produto: tensor([[[[ 0.,  2.],
  

In [61]:
pytorch_conv_layer = torch.nn.Conv2d(out_channels=out_channels_dummy, in_channels=in_channels_dummy, kernel_size=kernel_size_dummy, stride=stride_dummy, padding=0)
pytorch_conv_layer.load_state_dict(dict(weight=initial_weights_dummy, bias=initial_bias_dummy))
target_out = pytorch_conv_layer(x)

In [62]:
target_out.shape

torch.Size([1, 1, 4, 5])

In [63]:
assert torch.allclose(out, target_out, atol=1e-6)

## Compare se sua implementação está igual à do pytorch usando um exemplo aleatório

In [73]:
x = torch.rand(2, in_channels, height_in, width_in)
print(f"x.shape: {x.shape}, x:{x}")

x.shape: torch.Size([2, 1, 28, 28]), x:tensor([[[[8.4142e-01, 3.5439e-01, 6.1172e-02,  ..., 9.2555e-01,
           4.1156e-01, 8.8787e-01],
          [4.8619e-01, 1.7892e-01, 3.3761e-01,  ..., 5.3331e-01,
           3.6692e-01, 9.4743e-01],
          [7.5571e-01, 6.5270e-01, 5.4962e-01,  ..., 7.7583e-01,
           8.8931e-01, 3.1293e-01],
          ...,
          [4.5140e-01, 3.3445e-01, 7.1822e-01,  ..., 8.7009e-01,
           7.1349e-01, 3.1192e-02],
          [8.1161e-01, 9.2543e-01, 9.9241e-02,  ..., 3.7423e-01,
           6.6530e-01, 8.5440e-01],
          [2.9395e-01, 9.4599e-01, 2.9556e-01,  ..., 5.0896e-01,
           6.3630e-02, 1.5010e-02]]],


        [[[3.0296e-01, 8.4186e-01, 5.7012e-01,  ..., 3.3977e-01,
           1.2933e-01, 8.4123e-01],
          [2.3678e-01, 2.8399e-01, 7.9764e-01,  ..., 9.5889e-01,
           8.6574e-01, 5.4478e-02],
          [2.8048e-01, 3.1483e-01, 9.7353e-02,  ..., 6.4659e-01,
           8.4145e-01, 1.3383e-02],
          ...,
          [5.8661e

In [76]:
conv_layer = MyConv2d(out_channels=out_channels, in_channels=in_channels, kernel_size=kernel_size, stride=stride)
conv_layer.weight.data = initial_conv_weight
conv_layer.bias.data = initial_conv_bias
print(f"conv_layer.weight.data.shape: {conv_layer.weight.data.shape}, conv_layer.bias.data.shape: {conv_layer.bias.data.shape}")
print(f"conv_layer.weight.data: {conv_layer.weight.data}, conv_layer.bias.data: {conv_layer.bias.data}")

Inicializado MyConv2d
in_channels: 1 
out_channels: 2 
kernel_size: 5 
stride: 3 
weight.shape: torch.Size([2, 1, 5, 5]) 
weight: Parameter containing:
tensor([[[[-7.6017e-03, -6.1803e-03,  9.0250e-03, -7.5432e-03, -5.4495e-03],
          [-3.7104e-03,  6.3617e-03,  7.5090e-03,  5.0867e-03,  7.0336e-03],
          [-6.1566e-04,  6.0817e-03,  7.7439e-03,  7.2776e-03,  2.3919e-03],
          [-7.1268e-03,  4.6978e-03,  9.3515e-03,  7.4757e-05,  3.2798e-03],
          [-5.5757e-03, -4.2965e-04,  5.5089e-03,  8.3744e-03, -6.4920e-03]]],


        [[[ 3.6035e-03,  8.8664e-04, -4.1927e-03, -5.6483e-03, -7.1350e-03],
          [ 7.4180e-03,  7.4519e-04, -5.4948e-03,  6.1529e-03,  9.1575e-03],
          [-4.1499e-03, -9.4623e-03, -6.7755e-03,  7.9669e-03,  1.8236e-03],
          [-9.1121e-03, -6.2279e-03, -9.6819e-03, -9.7706e-03,  4.3225e-04],
          [-7.6296e-03,  4.2595e-03, -7.7148e-03, -2.7186e-04,  9.5596e-03]]]],
       requires_grad=True) 
bias.shape: torch.Size([2]) 
bias: Paramete

In [77]:
out = conv_layer(x)

 num_amostras: 2, self.out_channels: 2, num_linhas_entrada: 28, num_colunas_entrada: 28, num_linhas_saida: 8, num_colunas_saida: 8
saida.shape: torch.Size([2, 2, 8, 8])

ndx_amostra: 0

ndx_in_channels: 0

ndx_linhas_saida, ndx_colunas_saida: 0, 0
 alvo do kernel em x: x[0,0,0:5, 0:5]
 
 tensor([[0.8414, 0.3544, 0.0612, 0.1211, 0.3592],
        [0.4862, 0.1789, 0.3376, 0.7700, 0.6488],
        [0.7557, 0.6527, 0.5496, 0.0573, 0.4063],
        [0.0170, 0.2940, 0.4368, 0.0756, 0.9769],
        [0.8789, 0.5654, 0.0743, 0.1479, 0.3093]])
 produto: tensor([[[[-3.4311e-03,  1.1739e-04, -3.0382e-04,  4.5678e-04, -3.0602e-03],
          [ 3.5640e-03, -1.3005e-03, -2.6842e-03, -4.8654e-03,  2.9386e-03],
          [-2.7923e-03,  2.4425e-03, -4.6648e-03, -3.4753e-04, -1.4918e-03],
          [-3.3431e-05, -2.2425e-03,  2.8604e-03, -1.7836e-04,  3.1356e-03],
          [ 6.2151e-03,  1.0533e-03,  2.0317e-04,  1.4277e-03, -1.3948e-03]]],


        [[[ 2.6652e-03, -1.5767e-03,  4.3716e-04,  9.6738e-04

In [78]:
out.shape

torch.Size([2, 2, 8, 8])

In [79]:
# Usa os mesmos pesos para minha implementação e a do pytorch
pytorch_conv_layer = torch.nn.Conv2d(out_channels=out_channels, in_channels=in_channels, kernel_size=kernel_size, stride=stride, padding=0)
pytorch_conv_layer.load_state_dict(dict(weight=initial_conv_weight, bias=initial_conv_bias))

target_out = pytorch_conv_layer(x)

In [80]:
target_out.shape

torch.Size([2, 2, 8, 8])

In [81]:
assert torch.allclose(out, target_out, atol=1e-6)

## Modelo

In [None]:
class Net(torch.nn.Module):
    def __init__(self, height_in: int, width_in: int, in_channels: int, out_channels: int, kernel_size: int, stride: int):
        super(Net, self).__init__()
        self.conv_layer = MyConv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride)
   
        height_out = (height_in - kernel_size - 1) // stride + 1
        width_out = (width_in - kernel_size - 1) // stride + 1
        self.classification_layer = torch.nn.Linear(out_channels * height_out * width_out, 10)

    def forward(self, x):
        hidden = self.conv_layer(x)
        hidden = torch.nn.functional.relu(hidden)
        hidden = hidden.reshape(x.shape[0], -1)
        logits = self.classification_layer(hidden)
        return logits

## Treinamento

### Definição dos hiperparâmetros

In [None]:
n_epochs = 50
lr = 0.1

### Laço de treinamento

In [None]:
model = Net(height_in=height_in, width_in=width_in, in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride)

# Usa pesos iniciais pré-difinidos
model.classification_layer.load_state_dict(dict(weight=initial_classification_weight, bias=initial_classification_bias))
model.conv_layer.weight.data = initial_conv_weight
model.conv_layer.bias.data = initial_conv_bias

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr)

epochs = []
loss_history = []
loss_epoch_end = []
total_trained_samples = 0
for i in range(n_epochs):
    for x_train, y_train in loader_train:
        # predict da rede
        outputs = model(x_train)

        # calcula a perda
        loss = criterion(outputs, y_train)

        # zero, backpropagation, ajusta parâmetros pelo gradiente descendente
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_trained_samples += x_train.size(0)
        epochs.append(total_trained_samples / len(dataset_train))
        loss_history.append(loss.item())

    loss_epoch_end.append(loss.item())
    print(f'Epoch: {i:d}/{n_epochs - 1:d} Loss: {loss.item()}')


### Visualização usual da perda, somente no final de cada minibatch

In [None]:
n_batches_train = len(loader_train)
plt.plot(epochs[::n_batches_train], loss_history[::n_batches_train])
plt.xlabel('época')

In [None]:
loss_epoch_end

In [None]:
# Assert do histórico de losses
target_loss_epoch_end = np.array([
    2.303267478942871,
    2.227701187133789,
    1.0923893451690674,
    0.5867354869842529,
    0.5144089460372925,
    0.45026642084121704,
    0.4075140357017517,
    0.37713879346847534,
    0.3534485101699829,
    0.3341451585292816,
    0.3181140422821045,
    0.30457887053489685,
    0.29283496737480164,
    0.2827608287334442,
    0.2738332152366638,
    0.2657742500305176,
    0.2583288848400116,
    0.25117507576942444,
    0.24439716339111328,
    0.23789969086647034,
    0.23167723417282104,
    0.22562651336193085,
    0.21984536945819855,
    0.2142913043498993,
    0.20894232392311096,
    0.203872948884964,
    0.19903430342674255,
    0.19439971446990967,
    0.18994088470935822,
    0.18563991785049438,
    0.18147490918636322,
    0.17744913697242737,
    0.17347246408462524,
    0.16947467625141144,
    0.16547319293022156,
    0.16150487959384918,
    0.1574639081954956,
    0.1534043848514557,
    0.14926929771900177,
    0.1452063024044037,
    0.1412365883588791,
    0.13712672889232635,
    0.1331038922071457,
    0.1291467249393463,
    0.1251506358385086,
    0.12116757035255432,
    0.11731722950935364,
    0.11364627629518509,
    0.11001908034086227,
    0.10655981302261353])

assert np.allclose(np.array(loss_epoch_end), target_loss_epoch_end, atol=1e-6)

## Rascunho

In [66]:
in_channels_dummy = 1
out_channels_dummy = 2
kernel_size_dummy = 3
stride_dummy = 2
num_amostras_dummy = 1
x = torch.arange(30).float().reshape(num_amostras_dummy, 1, 5, 6)

In [67]:
print(x)

tensor([[[[ 0.,  1.,  2.,  3.,  4.,  5.],
          [ 6.,  7.,  8.,  9., 10., 11.],
          [12., 13., 14., 15., 16., 17.],
          [18., 19., 20., 21., 22., 23.],
          [24., 25., 26., 27., 28., 29.]]]])


In [None]:
# falta stride e tratar out_channels > 1

In [69]:

conv_layer = MyConv2d( out_channels=out_channels_dummy, in_channels=in_channels_dummy, kernel_size=kernel_size_dummy, stride=stride_dummy)

# Usa os mesmos pesos para minha implementação e a do pytorch
initial_weights_dummy = torch.arange(in_channels_dummy * out_channels_dummy * kernel_size_dummy * kernel_size_dummy).float()
initial_weights_dummy = initial_weights_dummy.reshape(out_channels_dummy, in_channels_dummy,  kernel_size_dummy, kernel_size_dummy)
initial_bias_dummy = torch.arange(out_channels_dummy,).float()
print(f"initial_bias_dummy.shape {initial_bias_dummy.shape}, initial_weights_dummy.shape {initial_weights_dummy.shape}")

Inicializado MyConv2d
in_channels: 1 
out_channels: 2 
kernel_size: 3 
stride: 2 
weight.shape: torch.Size([2, 1, 3, 3]) 
weight: Parameter containing:
tensor([[[[-0.0082, -0.0041, -0.0035],
          [ 0.0023, -0.0072,  0.0030],
          [-0.0095,  0.0072, -0.0016]]],


        [[[-0.0069, -0.0036,  0.0015],
          [ 0.0053, -0.0033, -0.0039],
          [ 0.0075,  0.0062, -0.0042]]]], requires_grad=True) 
bias.shape: torch.Size([2]) 
bias: Parameter containing:
tensor([ 0.0033, -0.0044], requires_grad=True) 
initial_bias_dummy.shape torch.Size([2]), initial_weights_dummy.shape torch.Size([2, 1, 3, 3])


In [70]:
conv_layer.weight.data = initial_weights_dummy
conv_layer.bias.data = initial_bias_dummy
print(f"conv_layer.weight.data: {conv_layer.weight.data}")
print(f"conv_layer.bias.data: {conv_layer.bias.data}")


conv_layer.weight.data: tensor([[[[ 0.,  1.,  2.],
          [ 3.,  4.,  5.],
          [ 6.,  7.,  8.]]],


        [[[ 9., 10., 11.],
          [12., 13., 14.],
          [15., 16., 17.]]]])
conv_layer.bias.data: tensor([0., 1.])


In [71]:
print(x)

tensor([[[[ 0.,  1.,  2.,  3.,  4.,  5.],
          [ 6.,  7.,  8.,  9., 10., 11.],
          [12., 13., 14., 15., 16., 17.],
          [18., 19., 20., 21., 22., 23.],
          [24., 25., 26., 27., 28., 29.]]]])


In [72]:
out = conv_layer(x)


 num_amostras: 1, self.out_channels: 2, num_linhas_entrada: 5, num_colunas_entrada: 6, num_linhas_saida: 2, num_colunas_saida: 2
saida.shape: torch.Size([1, 2, 2, 2])

ndx_amostra: 0

ndx_in_channels: 0

ndx_linhas_saida, ndx_colunas_saida: 0, 0
 alvo do kernel em x: x[0,0,0:3, 0:3]
 
 tensor([[ 0.,  1.,  2.],
        [ 6.,  7.,  8.],
        [12., 13., 14.]])
 produto: tensor([[[[  0.,   1.,   4.],
          [ 18.,  28.,  40.],
          [ 72.,  91., 112.]]],


        [[[  0.,  10.,  22.],
          [ 72.,  91., 112.],
          [180., 208., 238.]]]], grad_fn=<MulBackward0>)
 soma: tensor([[[[366.]]],


        [[[933.]]]], grad_fn=<SumBackward1>)
 valor_soma: tensor([366., 933.], grad_fn=<SqueezeBackward0>)
 somado na saída em [0, 0, 0, 0] = 366.0
 somado na saída em [0, 1, 0, 0] = 933.0

ndx_linhas_saida, ndx_colunas_saida: 0, 1
 alvo do kernel em x: x[0,0,0:3, 2:5]
 
 tensor([[ 2.,  3.,  4.],
        [ 8.,  9., 10.],
        [14., 15., 16.]])
 produto: tensor([[[[  0.,   3.,   8.]

In [None]:
temp = torch.tensor([[[[34.]]]])

In [None]:
temp.squeeze()

tensor(34.)

In [None]:
temp.squeeze().shape

torch.Size([])

In [None]:
temp.squeeze().reshape(1,)

tensor([34.])

In [None]:
temp.view(1,-1)

tensor([[34.]])

In [None]:
temp = torch.tensor([[[[ 46.]], [[134.]]]])

In [None]:
temp.squeeze()

tensor([ 46., 134.])

In [None]:
temp.squeeze().shape

torch.Size([2])

In [None]:
temp.squeeze().reshape(2,)

tensor([ 46., 134.])

In [None]:
temp.squeeze().shape


torch.Size([2])