**Table of contents**<a id='toc0_'></a>    
- [Optimizing the model parameters](#toc1_)    
  - [Prerequisite code](#toc1_1_)    
  - [Setting hyperparameters](#toc1_2_)    
  - [Add an optimization loop](#toc1_3_)    
    - [Add a loss function](#toc1_3_1_)    
    - [Optimization pass](#toc1_3_2_)    
  - [Full implementation](#toc1_4_)    
  - [Saving Models](#toc1_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

# <a id='toc1_'></a>[Optimizing the model parameters](#toc0_)

- Training a model is an iterative process; in each iteration (epoch). The model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters (as we saw in the previous module), and optimizes these parameters using gradient descent.

## <a id='toc1_1_'></a>[Prerequisite code](#toc0_)

In [2]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

## <a id='toc1_2_'></a>[Setting hyperparameters](#toc0_)

We define the following hyperparameters for training:
 - **Number of Epochs** - the number times the entire training dataset is passed through the network. 
 - **Batch Size** - the number of data samples seen by the model in each epoch. Iterates over the number of batches needed to complete an epoch.
 - **Learning Rate** - the size of steps that the model matches as it searches for the best weights that will produce a higher model accuracy.
 
 Smaller values means the model will take a longer time to find the best weights. Larger values may result in the model stepping over and missing the best weights, which yields unpredictable behavior during training.

In [3]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## <a id='toc1_3_'></a>[Add an optimization loop](#toc0_)

- Each iteration of the optimization loop is called an epoch.
- Each epoch consists of two main parts:

    - The Train Loop - iterate over the training dataset and try to converge to optimal parameters.
    - The Validation/Test Loop - iterate over the test dataset to check if model performance is improving.


### <a id='toc1_3_1_'></a>[Add a loss function](#toc0_)

常见的损失函数:

- nn.MSELoss (Mean Square Error) used for regression tasks
- nn.NLLLoss (Negative Log Likelihood) used for classification
- nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss


In [4]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

### <a id='toc1_3_2_'></a>[Optimization pass](#toc0_)

Inside the training loop, optimization happens in three steps:

    - Call ``optimizer.zero_grad()`` to reset the gradients of model parameters.
        默认情况下渐变相加；为了防止重复计算，我们在每次迭代时明确地将它们归零
    - Back-propagate the prediction loss with a call to loss.backwards()
    - Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

In [5]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

## <a id='toc1_4_'></a>[Full implementation](#toc0_)

In [6]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):        
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [7]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.300464  [    0/60000]
loss: 2.286695  [ 6400/60000]
loss: 2.269128  [12800/60000]
loss: 2.268317  [19200/60000]
loss: 2.258540  [25600/60000]
loss: 2.232573  [32000/60000]
loss: 2.241280  [38400/60000]
loss: 2.215233  [44800/60000]
loss: 2.221534  [51200/60000]
loss: 2.181303  [57600/60000]
Test Error: 
 Accuracy: 48.5%, Avg loss: 0.034337 

Epoch 2
-------------------------------
loss: 2.206467  [    0/60000]
loss: 2.196480  [ 6400/60000]
loss: 2.132880  [12800/60000]
loss: 2.140418  [19200/60000]
loss: 2.139766  [25600/60000]
loss: 2.075788  [32000/60000]
loss: 2.101022  [38400/60000]
loss: 2.043405  [44800/60000]
loss: 2.072541  [51200/60000]
loss: 1.977665  [57600/60000]
Test Error: 
 Accuracy: 53.7%, Avg loss: 0.031434 

Epoch 3
-------------------------------
loss: 2.050360  [    0/60000]
loss: 2.031549  [ 6400/60000]
loss: 1.892955  [12800/60000]
loss: 1.912312  [19200/60000]
loss: 1.937670  [25600/60000]
loss: 1.818116  [32000/600

## <a id='toc1_5_'></a>[Saving Models](#toc0_)

- PyTorch 模型将学习到的参数存储在内部状态字典中，称为 state_dict
- 这些可以通过 torch.save 方法持久化：

In [8]:
torch.save(model.state_dict(), "data/model.pth")

print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth
