# Neural Networks #

A typical training procedure for a neural network is as follows:

Define the neural network that has some learnable parameters (or weights)

Iterate over a dataset of inputs

Process input through the network

Compute the loss (how far is the output from being correct)

Propagate gradients back into the network’s parameters

Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

神经网络的典型训练过程如下：

定义具有某些可学习参数（或权重）的神经网络

对输入数据集进行迭代

通过网络处理输入

计算损失（输出离正确还有多远）

将梯度传回网络参数中

更新网络权重，通常使用简单的更新规则：权重 = 权重 - 学习率 * 梯度

## Instrumentation et évaluation "en continu" du système ##

In [3]:
!pip install torch torchvision
import torch
import torch.nn as nn
import torch.nn.functional as F



In [4]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

cpu


In [5]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

In [6]:
net.to(device)

Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

In [7]:
# inputs, labels = data[0].to(device), data[1].to(device)

在 CNN 网络的训练部分，列出不同层和子层。

### La couche de convolution : ###
La couche de convolution applique un ensemble de filtres convolutifs aux images en entrée, chacun d'entre eux activant certaines caractéristiques des images.
- `self.conv1` Cette couche de convolution prend en entrée des images avec 3 canaux (RGB) et produit 6 canaux en sortie à l'aide d'un noyau de taille 5x5.
- `self.conv2` Une deuxième couche de convolution qui prend les 6 canaux de sortie de la couche précédente et produit 16 canaux de sortie à l'aide d'un noyau de taille 5x5.

### Les couches entièrement connectées : ###
Chaque couche entièrement connectée effectue une transformation linéaire suivie d'une activation ReLU, qui introduit de la non-linéarité dans le réseau.  Du coup, dans chacune des couches entièrement connectées (`self.fc1`, `self.fc2` et `self.fc3`), il y a une sous-couche linéaire suivie d'une sous-couche non linéaire. Ces couches sont responsables de la combinaison des caractéristiques extraites par les couches de convolution précédentes pour effectuer la tâche de classification finale.

`self.fc1` (Première couche entièrement connectée)
- Sous-couche linéaire : C'est la première étape de la couche. Elle effectue une transformation linéaire des caractéristiques d'entrée.
- Sous-couche non linéaire : Après la transformation linéaire, une activation ReLU est appliquée. Cela introduit de la non-linéarité dans la sortie de la couche.

`self.fc2` (Deuxième couche entièrement connectée)
- Sous-couche linéaire : Comme pour la première couche, cette couche effectue une transformation linéaire des caractéristiques.
- Sous-couche non linéaire : Suite à la transformation linéaire, une activation ReLU est appliquée.

`self.fc3` (Couche de sortie)
- Sous-couche linéaire : Il s'agit de la dernière transformation linéaire qui produit la sortie finale du réseau.

### La couche de pooling : ###
L'opération de pooling consiste à réduire la taille des images, tout en préservant leurs caractéristiques importantes. Cette couche en effectuant une opération de max pooling avec une fenêtre de taille 2x2 et un pas de 2.
- `self.pool` qui est utilisée après chaque couche de convolution

### La couche de correction ReLU ###
La couche de correction ReLU favorise un apprentissage plus rapide et plus efficace en remplaçant les valeurs négatives par des zéros et en conservant les valeurs positives.
- Fonctions d'activation ReLU `F.relu` qui introduit de la non-linéarité dans le réseau.

=================================================================================

在计算过程中给出不同数据张量 Xn 和权重 Wn 的大小。

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np

In [7]:
transform = transforms.Compose( # 创建了一个转换组合，将一系列的数据预处理操作组合在一起。这里使用了两个预处理操作
    [transforms.ToTensor(), # 将图像转换为 PyTorch 张量，并将像素值缩放到 [0, 1] 的范围
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # 对图像进行标准化处理，减去均值（0.5）并除以标准差（0.5）

batch_size = 4 #指定了数据加载器每次加载的批次大小，这里设置为 4。

# 创建了 CIFAR-10 数据集的训练集对象。root 参数指定了数据集存储的根目录，train=True 表示加载训练集，
# download=True 表示如果数据集不存在则自动下载，transform=transform 表示应用之前定义的数据预处理操作。
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

# 创建了一个训练集数据加载器。trainloader 负责从训练集中加载数据，shuffle=True 表示每个 epoch 都会对数据进行洗牌，
# num_workers=2 表示使用两个子进程来加载数据以加快速度。
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)

# 这里分别创建了 CIFAR-10 数据集的测试集对象和测试集数据加载器。
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified


In [8]:
for p in net.parameters(): 
    print(p.size())

torch.Size([6, 3, 5, 5])
torch.Size([6])
torch.Size([16, 6, 5, 5])
torch.Size([16])
torch.Size([120, 400])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])


Convolutional Layer 1 (conv1):
- Taille de l'entrée: X1 = 3×32×32 = 3072
- Poids W1:
    - [6, 3, 5, 5]: Une première couche de convolution prend en entrée des images avec 3 canaux (RGB) et produit 6 canaux en sortie à l'aide d'un noyau de taille 5x5. Poids de convolution: 6×3×5×5=450
    - Bias: 6

Convolutional Layer 2 (conv2):
- Taille de l'entrée: X2 = 6×14×14 = 1176
- Poids W2:
    - [16, 6, 5, 5]: Une deuxième couche de convolution qui prend les 6 canaux d'entrée(6 canaux en entrée provenant de la couche précédente) et produit 16 canaux de sortie à l'aide d'un noyau de taille 5x5. Poids de convolution: 16×6×5×5=2400
    - Bias: 16

Fully Connected Layer 1 (fc1):
- Taille de l'entrée: X3 = 1×400 = 400
- Poids W3:
    - [120, 400]: Il y a 120 neurones dans la couche entièrement connectée, chacun connecté à une entrée de taille 400. Poids de convolution: 120×400 = 48000
    - Bias: 120

Fully Connected Layer 2 (fc2):
- Taille de l'entrée: X4 = 1×120 = 120
- Poids W4:
    - [84, 120]: 84 neurones dans la deuxième couche entièrement connectée, chacun connecté aux 120 neurones de la couche précédente. Poids de convolution: 84×120=10080
    - Bias: 84
  
Fully Connected Layer 3 (fc3):
- Taille de l'entrée: X5 = 1×84 = 84
- Poids W5:
    - [10, 84]: 10 neurones dans la dernière couche entièrement connectée, chacun connecté aux 84 neurones de la couche précédente. Poids de convolution: 10×84=840
    - Bias: 10

=================================================================================

修改程序，在每个历元之后和第一个历元之前进行评估（创建一个专门函数）。删除其他中间显示。

In [9]:
#这行代码定义了损失函数，即交叉熵损失函数。交叉熵损失函数通常用于多类别分类问题
criterion = nn.CrossEntropyLoss()

#这行代码定义了优化器，即随机梯度下降（SGD）优化器。SGD是一种常用的优化算法，用于更新神经网络的权重以最小化损失函数。
#lr=0.001指定了学习率，即每次更新时的步长。momentum=0.9是SGD的一个超参数，用于加速SGD在相关方向上前进，并减小波动。
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [10]:
def evaluate_model(net, dataloader, criterion):
    eval_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for data in dataloader:
            images, labels = data
            outputs = net(images)
            loss = criterion(outputs, labels)
            eval_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = correct / total
    eval_loss /= len(dataloader)
    print(f'Evaluation Loss: {eval_loss:.3f}')
    print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

# Évaluation avant la première époque
evaluate_model(net, testloader, criterion)

def train(epoch_num):
    # Boucle pour itérer sur plusieurs époques de l'ensemble de données
    for epoch in range(epoch_num):
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
    
            if i % 2000 == 1999:
                print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
                running_loss = 0.0
    
        # Évaluation après chaque époque
        print(f'Epoch {epoch + 1} Evaluation:')
        evaluate_model(net, testloader, criterion)

epoch_num = 2
train(epoch_num)
print('Finished Training')

Evaluation Loss: 2.306
Accuracy of the network on the 10000 test images: 6 %
[1,  2000] loss: 2.219
[1,  4000] loss: 1.863
[1,  6000] loss: 1.671
[1,  8000] loss: 1.577
[1, 10000] loss: 1.527
[1, 12000] loss: 1.473
Epoch 1 Evaluation:
Evaluation Loss: 1.449
Accuracy of the network on the 10000 test images: 47 %
[2,  2000] loss: 1.405
[2,  4000] loss: 1.374
[2,  6000] loss: 1.334
[2,  8000] loss: 1.330
[2, 10000] loss: 1.313
[2, 12000] loss: 1.287
Epoch 2 Evaluation:
Evaluation Loss: 1.259
Accuracy of the network on the 10000 test images: 54 %
Finished Training


=================================================================================

修改函数以计算每个阶段执行的浮点运算次数，分别计算加法、乘法、最大值和总值。

In [20]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def count_operations(self, x):
        conv1_out = self.conv1(x)
        conv1_out_pooled = self.pool(F.relu(conv1_out))  # Apply max pooling after the first convolution
        conv2_out = self.conv2(conv1_out_pooled)
        
        # Count operations for convolutional layer 1
        conv1_ops = self.count_conv_operations(x, conv1_out, self.conv1)
        
        # Count operations for convolutional layer 2
        conv2_ops = self.count_conv_operations(conv1_out_pooled, conv2_out, self.conv2)
        
        return conv1_ops, conv2_ops

    
    def count_conv_operations(self, input, output, conv_layer):
        # batch_size = input.size(0)
        out_channels, in_channels = output.size(1), conv_layer.in_channels
        output_height, output_width = output.size(2), output.size(3)
        filter_size = conv_layer.kernel_size[0]
        stride = conv_layer.stride[0]
        padding = conv_layer.padding[0]
        
        # Compute number of operations for convolution
        print(str(output_height) + " * " + str(output_width) + " * " + str(in_channels) + " * " + str(filter_size**2) + " * " + str(out_channels))
        num_mults = output_height * output_width * in_channels * filter_size ** 2 * out_channels
        num_adds = output_height * output_width * in_channels * filter_size ** 2 * out_channels
        num_maxs = output_height * output_width * out_channels
        # 14 * 14 * 6 * 3?
        # 5 * 5 * 16 * 3?
        print(str(output_height) + " * " + str(output_width) + " * " + str(in_channels))
        total_ops = num_mults + num_adds + num_maxs
        return num_mults, num_adds, num_maxs, total_ops


    def count_fc_operations(self, input, fc_layer):
        # Get the number of input features for the fully connected layer
        in_features = fc_layer.in_features
        
        # Get the number of output features for the fully connected layer
        out_features = fc_layer.out_features
                
        # Compute number of operations for fully connected layer
        print(str(out_features) + " * " + str(in_features))
        num_mults = out_features * in_features
        num_adds = out_features * in_features
        num_maxs = 0
                
        total_ops = num_mults + num_adds
        return num_mults, num_adds, num_maxs, total_ops

    def count_total_operations(self, x):
        conv1_ops, conv2_ops = self.count_operations(x)
        fc1_ops = self.count_fc_operations(x, self.fc1)
        fc2_ops = self.count_fc_operations(x, self.fc2)
        fc3_ops = self.count_fc_operations(x, self.fc3)

        total_ops = sum(op[3] for op in [conv1_ops, conv2_ops, fc1_ops, fc2_ops, fc3_ops])
        return total_ops


net = Net()

In [21]:
# Load the model weights
PATH = './cifar_net.pth'
net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

In [22]:
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0))) 
    plt.show()

In [23]:
# Let's assume you have an iterator `dataiter` that iterates over your test data
dataiter = iter(testloader) 

# Get the next batch of data
images, labels = next(dataiter)

# Print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))

# Compute operation counts for convolutional layer 1 and convolutional layer 2
conv1_ops, conv2_ops = net.count_operations(images)

# Compute operation counts for fully connected layer 1, layer 2, layer 3
fc1_ops = net.count_fc_operations(images, net.fc1)
fc2_ops = net.count_fc_operations(images, net.fc2)
fc3_ops = net.count_fc_operations(images, net.fc3)

total_ops = net.count_total_operations(images)

# Print operation counts for convolutional layer 1
print("Operations for convolutional layer 1:")
print("Multiplications :", conv1_ops[0])
print("Additions :", conv1_ops[1])
print("Maximums :", conv1_ops[2])
print("Total :", conv1_ops[3])
print()

# Print operation counts for convolutional layer 2
print("Operations for convolutional layer 2:")
print("Multiplications :", conv2_ops[0])
print("Additions :", conv2_ops[1])
print("Maximums :", conv2_ops[2])
print("Total :", conv2_ops[3])
print()

# Print operation counts for fully connected layer 1
print("Operations for fully connected layer 1:")
print("Multiplications :", fc1_ops[0])
print("Additions :", fc1_ops[1])
print("Maximums :", fc1_ops[2])
print("Total :", fc1_ops[3])
print()

# Print operation counts for fully connected layer 2
print("Operations for fully connected layer 2:")
print("Multiplications :", fc2_ops[0])
print("Additions :", fc2_ops[1])
print("Maximums :", fc2_ops[2])
print("Total :", fc2_ops[3])
print()

# Print operation counts for fully connected layer 3
print("Operations for fully connected layer 3:")
print("Multiplications :", fc3_ops[0])
print("Additions :", fc3_ops[1])
print("Maximums :", fc3_ops[2])
print("Total :", fc3_ops[3])
print()

# Print operation counts for all layer
print("Total operations:", total_ops)

NameError: name 'testloader' is not defined

=================================================================================

In [20]:
from torchsummary import summary

In [21]:
summary(net, (3, 32, 32))
# Number of Parameters CONVOL1 = out_channels × (in_channels × kernel_size ** 2 + 1) = 6 * (3 * 25 + 1) = 6 * 76 = 456
# Taille de sortie après 1 layer convolution: ((size_input - kernel_size) + 1 // stride) =  (32 - 5) + 1 = 28
# Maxpooling (2 * 2) = Taille de sortie // 2 = 28 // 2 = 14
# MaxPool2d 层不包含可学习的参数 = 0
# Taille de sortie après 2 layer convolution: ((14 - 5) + 1) // 1 = 10
# Number of Parameters CONVOL2 = out_channels × (in_channels × kernel_size ** 2 + 1) = 16 * (6 * 25 + 1) = 2416
# Maxpooling (2 * 2) = Taille de sortie // 2 = 10 // 2 = 5
# Number of Parameters FC1 =(in_channels + 1) × out_channels = ((16 * 5 * 5) + 1) * 120 = 48120
# Number of Parameters FC2 =(in_channels + 1) × out_channels = ((120) + 1) * 84 = 10164
# Number of Parameters FC3 =(in_channels + 1) × out_channels = ((84) + 1) * 10 = 850

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
Total params: 62,006
Trainable params: 62,006
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.06
Params size (MB): 0.24
Estimated Total Size (MB): 0.31
----------------------------------------------------------------


=================================================================================

在运算结束时，除显示总错误率外，还显示已执行的运算次数、执行时间和每秒运算次数。

In [14]:
import time

# print(total_ops)
nops = 1308168
print("New Gflops/image: %.3f"% (nops/1000000000.0))

# Fonction pour calculer le taux d'erreur global
def calculate_error_rate(predictions, labels):
    # Implémentez votre calcul du taux d'erreur ici
    error_rate = 0.0
    return error_rate

print("Gflops/image: %.3f" % (total_ops / 1000000000.0))

tt = 0
error_rate_sum = 0.0
total_ops_iter = 0

# Epoch: 001/040 | Batch 0000/0175 | Loss: 2.3033
# Epoch: 001/040 | Batch 0050/0175 | Loss: 2.0240
# Epoch: 001/040 | Batch 0100/0175 | Loss: 1.9445
# Epoch: 001/040 | Batch 0150/0175 | Loss: 1.8135
# ***Epoch: 001/040 | Train. Acc.: 33.674% | Loss: 1.703
# ***Epoch: 001/040 | Valid. Acc.: 34.880% | Loss: 1.670
# Time elapsed: 1.05 min
# Epoch: 002/040 | Batch 0000/0175 | Loss: 1.7606
# Epoch: 002/040 | Batch 0050/0175 | Loss: 1.5473
# Epoch: 002/040 | Batch 0100/0175 | Loss: 1.5496
# Epoch: 002/040 | Batch 0150/0175 | Loss: 1.5093
# ***Epoch: 002/040 | Train. Acc.: 42.819% | Loss: 1.505
# ***Epoch: 002/040 | Valid. Acc.: 43.840% | Loss: 1.491
# Time elapsed: 2.09 min

for i in range(20): 
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        
        if torch.cuda.is_available(): torch.cuda.synchronize()
        t0 = time.time()
        
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
        if torch.cuda.is_available(): torch.cuda.synchronize()
        dt = time.time() - t0
        print("iter %2d: %.2f ms, %.4f Tflops"% (i, dt*1000, batch_size*total_ops/dt/1000000000000))
    
        print("Epoch: %2d/%2d | Batch %2d/%2d | Loss: %.4f"% (i, 20, batch_size*total_ops/dt/1000000000000) )
        if i % 2000 == 1999:
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0
    # ***Epoch: 002/040 | Valid. Acc.: 43.840% | Loss: 1.491
    print("***Epoch: %2d/%2d | Batch %2d/%2d | Loss: %.4f"% (i, 20, batch_size*total_ops/dt/1000000000000) )
    # Évaluation après chaque époque
    print(f'Epoch {epoch + 1} Evaluation:')
    evaluate_model(net, testloader, criterion)
    
average_dt = tt / 16
average_error_rate = error_rate_sum / 20
ops_per_second = total_ops_iter / tt

print("Average: %.2f ms, %.4f Tflops"% (average_dt*1000, batch_size*total_ops/average_dt/1000000000000))
print("Global Error Rate: %.4f" % average_error_rate)
print("Total Operations: %d" % total_ops_iter)
print("Operations Per Second: %f" % ops_per_second)


New Gflops/image: 0.001


NameError: name 'total_ops' is not defined

In [15]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class AlexNet(nn.Module):

    def __init__(self, num_classes):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 6, kernel_size=3, stride=1),  # 修改步长为1，以便产生有效的输出
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(6, 12, kernel_size=3),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(12, 16, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        # 在这里不需要指定固定的输出大小
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        # 修改全连接层的输入大小
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(16 * 6 * 6, 4096),  # 使用先前层的输出大小
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)  # 使用view展平张量
        logits = self.classifier(x)
        # 不需要在这里进行softmax操作，因为交叉熵损失函数通常会在损失计算中进行
        return logits

net = AlexNet(10)


In [16]:
net.to(device)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(12, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=576, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=10, bias=True)
  )
)

In [67]:
# Évaluation avant la première époque
evaluate_model(net, testloader, criterion)

# Boucle pour itérer sur plusieurs époques de l'ensemble de données
for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if i % 2000 == 1999:
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

    # Évaluation après chaque époque
    print(f'Epoch {epoch + 1} Evaluation:')
    evaluate_model(net, testloader, criterion)

print('Finished Training')

Evaluation Loss: 2.304
Accuracy of the network on the 10000 test images: 10 %
[1,  2000] loss: 2.304
[1,  4000] loss: 2.304
[1,  6000] loss: 2.304
[1,  8000] loss: 2.304
[1, 10000] loss: 2.303
[1, 12000] loss: 2.304
Epoch 1 Evaluation:
Evaluation Loss: 2.304
Accuracy of the network on the 10000 test images: 10 %
[2,  2000] loss: 2.304
[2,  4000] loss: 2.304
[2,  6000] loss: 2.303
[2,  8000] loss: 2.305
[2, 10000] loss: 2.304
[2, 12000] loss: 2.303
Epoch 2 Evaluation:
Evaluation Loss: 2.304
Accuracy of the network on the 10000 test images: 10 %
Finished Training
