# OPTIMIZING MODEL PARAMETERS
Now that we have a model and data it’s time to train,   
validate and test our model by optimizing its parameters on our data.  
데이터에 매개변수를 최적화하여 모델을 학습하고, 검증하고, 테스트  

Training a model is an iterative process;   
모델을 학습하는 과정은 반복적인 과정을 거침  

in each iteration the model makes a guess about the output,   
각 반복 단계에서 모델은 출력을 추측  

calculates the error in its guess (loss),   
추측과 정답 사이의 오류(손실(loss))를 계산  

collects the derivatives(도함수) of the error with respect to its parameters   
(as we saw in the previous section),   
매개변수에 대한 오류의 도함수(derivative)를 수집  

and optimizes these parameters using gradient descent.   
경사하강법을 사용하여 이 파라미터들을 최적화  

For a more detailed walkthrough of this process,   
check out this video on backpropagation from 3Blue1Brown.  
https://www.youtube.com/watch?v=tIeHLnjs5U8

### Prerequisite(기본) Code
We load the code from the previous sections on Datasets & DataLoaders and Build Model.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

### Hyperparameters
Hyperparameters are adjustable parameters that let you control the model optimization process.  
모델 최적화 과정을 제어할 수 있는 조절 가능한 매개변수  

Different hyperparameter values can impact model training and convergence rates  
서로 다른 하이퍼파라미터 값은 모델 학습과 수렴율(convergence rate)에 영향을 미칠 수 있음  

([read more](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)_ about hyperparameter tuning)  

We define the following hyperparameters for training:
 - **Number of Epochs** - the number times to iterate over the dataset 데이터셋을 반복하는 횟수
 
 - **Batch Size** - the number of data samples propagated through the network before the parameters are updated   
 매개변수가 갱신되기 전 신경망을 통해 전파된 데이터 샘플의 수
 
 - **Learning Rate** - how much to update models parameters at each batch/epoch.   
 Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.   
 각 배치/에폭에서 모델의 매개변수를 조절하는 비율.   
 값이 작을수록 학습 속도가 느려지고, 값이 크면 학습 중 예측할 수 없는 동작이 발생할 수 있음  
 예측할 수 없는 동작 -> 표현이 애매하게 되었는데 / 진동 혹은 발산


In [2]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Optimization Loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop.   
하이퍼파라미터를 설정한 뒤에는 최적화 단계를 통해 모델을 학습하고 최적화할 수 있음  

Each iteration of the optimization loop is called an **epoch**.  
최적화 단계의 각 반복(iteration)을 epoch라고 부름  

Each epoch consists of two main parts:
 - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.  
 학습용 데이터셋을 반복(iterate)하고 최적의 매개변수로 수렴하기를 시도  
 
 - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.  
 모델 성능이 개선되고 있는지를 확인하기 위해 테스트 데이터셋을 반복(iterate)  

Let's briefly familiarize ourselves with some of the concepts used in the training loop.   
Jump ahead to see the `full-impl-label` of the optimization loop.

### Loss Function

When presented with some training data, our untrained network is likely not to give the correct answer.   
학습용 데이터를 제공하면, 학습되지 않은 신경망은 정답을 제공하지 않을 확률이 높음  

**Loss function** measures the degree of dissimilarity of obtained result to the target value,  
obtain 얻다, 구하다  
얻은 결과와 목표 값 사이의 불일치 정도를 측정  

and it is the loss function that we want to minimize during training.  
훈련 중에 최소화하려는 손실 함수임  

To calculate the loss we make a prediction using the inputs of our given data sample   
and compare it against the true data label value.  
손실을 계산하려면 주어진 데이터 샘플의 입력을 사용하여 예측을 만들고 그것을 실제 데이터 레이블 값과 비교  

Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks,   
nn.MSELoss(평균 제곱 오차(MSE; Mean Square Error) : 회귀 문제(regression task)에 사용  

and [nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification.  
n.NLLLoss (음의 로그 우도(Negative Log Likelihood) :  분류(classification)에 사용  
확률적 예측 평가하고 손실을 최소화  

[nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines ``nn.LogSoftmax`` and ``nn.NLLLoss``.  
nn.LogSoftmax와 nn.NLLLoss를 합침

We pass our model's output logits to ``nn.CrossEntropyLoss``,   
which will normalize the logits and compute the prediction error.  
모델의 출력 logit을 nn.CrossEntropyLoss에 전달하여 logit을 정규화하고 예측 오류를 계산

In [3]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

### Optimizer

Optimization is the process of adjusting(조절) model parameters to reduce model error in each training step.   

**Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).  

All optimization logic is encapsulated in  the ``optimizer`` object.   
캡슐화됨, :객체의 자료와 행위를 하나로 묶고, 실제 구현 내용을 외부에 감추는 것

Here, we use the SGD optimizer;   
additionally, there are many [different optimizers](https://pytorch.org/docs/stable/optim.html)  

available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.  

We initialize the optimizer / by registering the model's parameters /   
that need to be trained, / and passing in the learning rate hyperparameter.   
학습하려는 모델의 매개변수와 학습률(learning rate) 하이퍼파라미터를 등록하여 옵티마이저를 초기화

In [4]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
 * Call ``optimizer.zero_grad()`` to reset the gradients of model parameters.  
 모델 매개변수의 변화도를 재설정  
  Gradients by default add up; to prevent double-counting,   
  we explicitly zero them at each iteration.  
  기본적으로 변화도는 더해지기(add up) 때문에 중복 계산을 막기 위해 반복할 때마다 명시적으로 0으로 설정  

 * Backpropagate the prediction loss with a call to ``loss.backward()``.   
 예측 손실(prediction loss)을 역전파  
 PyTorch deposits the gradients of the loss w.r.t. each parameter.  
 PyTorch는 각 매개변수에 대한 손실의 변화도를 저장  

 * Once we have our gradients, we call ``optimizer.step()``   
 to adjust the parameters by the gradients collected in the backward pass.  
 역전파 단계에서 수집된 변화도로 매개변수를 조정



### Full Implementation
We define ``train_loop`` that loops over our optimization code,   

and ``test_loop`` that evaluates the model's performance against our test data.  

``train_loop`` : 최적화 코드를 반복하여 수행  

``test_loop`` : 테스트 데이터로 모델의 성능을 측정


In [5]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad(): # 가중치 계산을 안 해도 됨
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop.   
손실 함수와 옵티마이저를 초기화하고 train_loop와 test_loop에 전달

Feel free / to increase the number of epochs / to track the model’s improving performance.  
모델의 성능 향상을 알아보기 위해 자유롭게 에폭(epoch) 수를 증가시켜 볼 수 있음

In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.308593  [   64/60000]
loss: 2.293018  [ 6464/60000]
loss: 2.276059  [12864/60000]
loss: 2.266814  [19264/60000]
loss: 2.249116  [25664/60000]
loss: 2.231523  [32064/60000]
loss: 2.230085  [38464/60000]
loss: 2.194862  [44864/60000]
loss: 2.192758  [51264/60000]
loss: 2.168314  [57664/60000]
Test Error: 
 Accuracy: 44.8%, Avg loss: 2.154566 

Epoch 2
-------------------------------
loss: 2.167217  [   64/60000]
loss: 2.151591  [ 6464/60000]
loss: 2.097949  [12864/60000]
loss: 2.113068  [19264/60000]
loss: 2.055542  [25664/60000]
loss: 2.012326  [32064/60000]
loss: 2.025604  [38464/60000]
loss: 1.941904  [44864/60000]
loss: 1.952305  [51264/60000]
loss: 1.890079  [57664/60000]
Test Error: 
 Accuracy: 55.2%, Avg loss: 1.877630 

Epoch 3
-------------------------------
loss: 1.908119  [   64/60000]
loss: 1.875005  [ 6464/60000]
loss: 1.759864  [12864/60000]
loss: 1.804685  [19264/60000]
loss: 1.697069  [25664/60000]
loss: 1.656414  [32064/600