# Homework 13 - Network Compression

Author: Chen-Wei Ke (b08501098@ntu.edu.tw), modified from ML2022-HW13 (Liang-Hsuan Tseng)

If you have any questions, feel free to ask: mlta-2023-spring@googlegroups.com

[**Link to HW13 Slides**](https://docs.google.com/presentation/d/1QAVMbnabmmMNvmugPlHMg_GVKaYrKa6hoTSFeJl9OCs/edit?usp=sharing)

## Outline

* [Packages](#Packages) - intall some required packages.
* [Dataset](#Dataset) - something you need to know about the dataset.
* [Configs](#Configs) - the configs of the experiments, you can change some hyperparameters here.
* [Architecture_Design](#Architecture_Design) - depthwise and pointwise convolution examples and some useful links.
* [Knowledge_Distillation](#Knowledge_Distillation) - KL divergence loss for knowledge distillation and some useful links.
* [Training](#Training) - training loop implementation modified from HW3.
* [Inference](#Inference) - create submission.csv by using the student_best.ckpt from the previous experiment.



### Packages
First, we need to import some useful packages. If the torchsummary package are not intalled, please install it via `pip install torchsummary`

In [1]:
!pip install torchsummary

Collecting torchsummary
  Downloading torchsummary-1.5.1-py3-none-any.whl.metadata (296 bytes)
Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1


In [2]:
# Import some useful packages for this homework
import numpy as np
import pandas as pd
import torch
import os
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image
from torch.utils.data import ConcatDataset, DataLoader, Subset, Dataset # "ConcatDataset" and "Subset" are possibly useful
from torchvision.datasets import DatasetFolder, VisionDataset
from torchsummary import summary
from tqdm import tqdm
import random

# !nvidia-smi # list your current GPU

### Configs
In this part, you can specify some variables and hyperparameters as your configs.

In [3]:
cfg = {
    'dataset_root': '/kaggle/input/ml2023spring-hw13/Food-11',
    'save_dir': '/kaggle/working/',
    'exp_name': "simple_baseline",
    'batch_size': 128,
    'lr': 1e-3,
    'seed': 20220013,
    'loss_fn_type': 'KD', # simple baseline: CE, medium baseline: KD. See the Knowledge_Distillation part for more information.
    'weight_decay': 1e-5,
    'grad_norm_max': 10,
    'n_epochs': 600, # train more steps to pass the medium baseline.
    'patience': 60,
}

In [4]:
myseed = cfg['seed']  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
random.seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

save_path = os.path.join(cfg['save_dir'], cfg['exp_name']) # create saving directory
os.makedirs(save_path, exist_ok=True)

# define simple logging functionality
log_fw = open(f"{save_path}/log.txt", 'w') # open log file to save log outputs
def log(text):     # define a logging function to trace the training process
    print(text)
    log_fw.write(str(text)+'\n')
    log_fw.flush()

log(cfg)  # log your configs to the log file

{'dataset_root': '/kaggle/input/ml2023spring-hw13/Food-11', 'save_dir': '/kaggle/working/', 'exp_name': 'simple_baseline', 'batch_size': 128, 'lr': 0.001, 'seed': 20220013, 'loss_fn_type': 'KD', 'weight_decay': 1e-05, 'grad_norm_max': 10, 'n_epochs': 600, 'patience': 60}


### Dataset
We use Food11 dataset for this homework, which is similar to homework3. But remember, Please DO NOT utilize the dataset of HW3. We've modified the dataset, so you should only access the dataset by loading it in this kaggle notebook or through the links provided in the HW13 colab notebooks.

In [5]:
# # fetch and download the dataset from github (about 1.12G)
# !wget -O Food-11.tar.gz https://www.dropbox.com/s/v97fi9xrwp9b964/food11-hw13.tar.gz?dl=0

In [6]:
# # extract the data
# !tar -xzf ./Food-11.tar.gz # Could take some time
# # !tar -xzvf ./Food-11.tar.gz # use this command if you want to checkout the whole process.

In [7]:
for dirname, _, filenames in os.walk('/kaggle/input/ml2023spring-hw13/Food-11'):
    if len(filenames) > 0:
        print(f"{dirname}: {len(filenames)} files.") # Show the file amounts in each split.

/kaggle/input/ml2023spring-hw13/Food-11: 1 files.
/kaggle/input/ml2023spring-hw13/Food-11/validation: 4432 files.
/kaggle/input/ml2023spring-hw13/Food-11/training: 9993 files.
/kaggle/input/ml2023spring-hw13/Food-11/evaluation: 2218 files.


Next, specify train/test transform for image data augmentation.
Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms. You can also apply the knowledge or experience you learned in HW3.

In [8]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# define training/testing transforms
test_tfm = transforms.Compose([
    # It is not encouraged to modify this part if you are using the provided teacher model. This transform is stardard and good enough for testing.
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
])

train_tfm = transforms.Compose([
    # add some useful transform or augmentation here, according to your experience in HW3.
#     transforms.Resize(256),  # You can change this
#     transforms.CenterCrop(224), # You can change this, but be aware of that the given teacher model's input size is 224.
#     # The training input size of the provided teacher model is (3, 224, 224).
#     # Thus, Input size other then 224 might hurt the performance. please be careful.
#     transforms.RandomHorizontalFlip(), # You can change this.
    transforms.RandomResizedCrop((224, 224), scale=(0.7, 1.0)),
    transforms.RandomHorizontalFlip(0.5),
    transforms.RandomVerticalFlip(0.5),
    transforms.RandomRotation(180),
    transforms.RandomAffine(30),
    transforms.ToTensor(),
    normalize,
])

In [9]:
class FoodDataset(Dataset):
    def __init__(self, path, tfm=test_tfm, files = None):
        super().__init__()
        self.path = path
        self.files = sorted([os.path.join(path,x) for x in os.listdir(path) if x.endswith(".jpg")])
        if files != None:
            self.files = files
        print(f"One {path} sample",self.files[0])
        self.transform = tfm
  
    def __len__(self):
        return len(self.files)
  
    def __getitem__(self,idx):
        fname = self.files[idx]
        im = Image.open(fname)
        im = self.transform(im)
        try:
            label = int(fname.split("/")[-1].split("_")[0])
        except:
            label = -1 # test has no label
        return im,label

In [10]:
# Form train/valid dataloaders
train_set = FoodDataset(os.path.join(cfg['dataset_root'],"training"), tfm=train_tfm)
train_loader = DataLoader(train_set, batch_size=cfg['batch_size'], shuffle=True, num_workers=4, pin_memory=True)

valid_set = FoodDataset(os.path.join(cfg['dataset_root'], "validation"), tfm=test_tfm)
valid_loader = DataLoader(valid_set, batch_size=cfg['batch_size'], shuffle=False, num_workers=4, pin_memory=True)

One /kaggle/input/ml2023spring-hw13/Food-11/training sample /kaggle/input/ml2023spring-hw13/Food-11/training/0_0.jpg
One /kaggle/input/ml2023spring-hw13/Food-11/validation sample /kaggle/input/ml2023spring-hw13/Food-11/validation/0_0.jpg


### Architecture_Design

In this homework, you have to design a smaller network and make it perform well. Apparently, a well-designed architecture is crucial for such task. Here, we introduce the depthwise and pointwise convolution. These variants of convolution are some common techniques for architecture design when it comes to network compression.

<img src="https://i.imgur.com/LFDKHOp.png" width=400px>

* explanation of depthwise and pointwise convolutions:
    * [prof. Hung-yi Lee's slides(p.24~p.30, especially p.28)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/tiny_v7.pdf)

* other useful techniques
    * [group convolution](https://www.researchgate.net/figure/The-transformations-within-a-layer-in-DenseNets-left-and-CondenseNets-at-training-time_fig2_321325862) (Actually, depthwise convolution is a specific type of group convolution)
    * [SqueezeNet](!https://arxiv.org/abs/1602.07360)
    * [MobileNet](!https://arxiv.org/abs/1704.04861)
    * [ShuffleNet](!https://arxiv.org/abs/1707.01083)
    * [Xception](!https://arxiv.org/abs/1610.02357)
    * [GhostNet](!https://arxiv.org/abs/1911.11907)


After introducing depthwise and pointwise convolutions, let's define the **student network architecture**. Here, we have a very simple network formed by some regular convolution layers and pooling layers. You can replace the regular convolution layers with the depthwise and pointwise convolutions. In this way, you can further increase the depth or the width of your network architecture.

In [11]:
# Define your student network here. You have to copy-paste this code block to HW13 GradeScope before deadline.
# We will use your student network definition to evaluate your results(including the total parameter amount).

# Example implementation of Depthwise and Pointwise Convolution 
# def dwpw_conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
#     return nn.Sequential(
#         nn.Conv2d(in_channels, in_channels, kernel_size, stride=stride, padding=padding, groups=in_channels), #depthwise convolution
#         nn.Conv2d(in_channels, out_channels, 1), # pointwise convolution
#     )
def dwpw_conv(ic, oc, kernel_size=3, stride=2, padding=1):
    return nn.Sequential(
        nn.Conv2d(ic, ic, kernel_size, stride=stride, padding=padding, groups=ic), #depthwise convolution
        nn.BatchNorm2d(ic),
        nn.LeakyReLU(0.01, inplace=True),
        nn.Conv2d(ic, oc, 1), # pointwise convolution
        nn.BatchNorm2d(oc),
        nn.LeakyReLU(0.01, inplace=True)
    )

# class StudentNet(nn.Module):
#     def __init__(self):
#       super().__init__()

#       # ---------- TODO ----------
#       # Modify your model architecture

#       self.cnn = nn.Sequential(
#         nn.Conv2d(3, 4, 3), 
#         nn.BatchNorm2d(4),
#         nn.ReLU(),    

#         nn.Conv2d(4, 16, 3), 
#         nn.BatchNorm2d(16),
#         nn.ReLU(),
#         nn.MaxPool2d(2, 2, 0),
        
#         nn.Conv2d(16, 64, 3), 
#         nn.BatchNorm2d(64),
#         nn.ReLU(),
#         nn.MaxPool2d(2, 2, 0),
        
#         nn.Conv2d(64, 84, 3), 
#         nn.BatchNorm2d(84),
#         nn.ReLU(),
#         nn.MaxPool2d(2, 2, 0),
        
#         # Here we adopt Global Average Pooling for various input size.
#         nn.AdaptiveAvgPool2d((1, 1)),
#       )
#       self.fc = nn.Sequential(
#         nn.Linear(84, 11),
#       )
      
#     def forward(self, x):
#       out = self.cnn(x)
#       out = out.view(out.size()[0], -1)
#       return self.fc(out)

class StudentNet(nn.Module):
    def __init__(self):
        super().__init__()

          # ---------- TODO ----------
       # Modify your model architecture
       # 224 --> 112
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
     
        self.layer1 = dwpw_conv(64, 64, stride=1) 
        self.layer2 = dwpw_conv(64, 128)
        self.layer3 = dwpw_conv(128, 256) 
        self.layer4 = dwpw_conv(256, 140) 
        # Here we adopt Global Average Pooling for various input size.
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(140, 11)
      
    def forward(self, x):
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.maxpool(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.avgpool(out)
        out = out.flatten(1)
        out = self.fc(out)
        return out

def get_student_model(): # This function should have no arguments so that we can get your student network by directly calling it.
    # you can modify or do anything here, just remember to return an nn.Module as your student network.  
    return StudentNet() 

# End of definition of your student model and the get_student_model API
# Please copy-paste the whole code block, including the get_student_model function.

After specifying the student network architecture, please use `torchsummary` package to get information about the network and verify the total number of parameters. Note that the total params of your student network should not exceed the limit (`Total params` in `torchsummary` ≤ 60,000). 

In [12]:
# DO NOT modify this block and please make sure that this block can run sucessfully. 
student_model = get_student_model()
summary(student_model, (3, 224, 224), device='cpu')
# You have to copy&paste the results of this block to HW13 GradeScope. 

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]             640
       BatchNorm2d-6           [-1, 64, 56, 56]             128
         LeakyReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]           4,160
       BatchNorm2d-9           [-1, 64, 56, 56]             128
        LeakyReLU-10           [-1, 64, 56, 56]               0
           Conv2d-11           [-1, 64, 28, 28]             640
      BatchNorm2d-12           [-1, 64, 28, 28]             128
        LeakyReLU-13           [-1, 64, 28, 28]               0
           Conv2d-14          [-1, 128,

In [13]:
# Load provided teacher model (model architecture: resnet18, num_classes=11, test-acc ~= 89.9%)
teacher_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=False, num_classes=11)
# load state dict
teacher_ckpt_path = os.path.join(cfg['dataset_root'], "resnet18_teacher.ckpt")
teacher_model.load_state_dict(torch.load(teacher_ckpt_path, map_location='cpu'))
# Now you already know the teacher model's architecture. You can take advantage of it if you want to pass the strong or boss baseline. 
# Source code of resnet in pytorch: (https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py)
# You can also see the summary of teacher model. There are 11,182,155 parameters totally in the teacher model
summary(teacher_model, (3, 224, 224), device='cpu')

Downloading: "https://github.com/pytorch/vision/zipball/v0.10.0" to /root/.cache/torch/hub/v0.10.0.zip


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

### Knowledge_Distillation

<img src="https://i.imgur.com/H2aF7Rv.png=100x" width="400px">

Since we have a learned big model, let it teach the other small model. In implementation, let the training target be the prediction of big model instead of the ground truth.

**Why it works?**
* If the data is not clean, then the prediction of big model could ignore the noise of the data with wrong labeled.
* There might have some relations between classes, so soft labels from teacher model might be useful. For example, Number 8 is more similar to 6, 9, 0 than 1, 7.


**How to implement?**
* $Loss = \alpha T^2 \times KL(p || q) + (1-\alpha)(\text{Original Cross Entropy Loss}), \text{where } p=softmax(\frac{\text{student's logits}}{T}), \text{and } q=softmax(\frac{\text{teacher's logits}}{T})$
* very useful link: [pytorch docs of KLDivLoss with examples](!https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html)
* original paper: [Distilling the Knowledge in a Neural Network](!https://arxiv.org/abs/1503.02531)

**Please be sure to carefully check each function's parameter requirements.**

In [14]:
# Implement the loss function with KL divergence loss for knowledge distillation.
# You also have to copy-paste this whole block to HW13 GradeScope. 
CE = nn.CrossEntropyLoss()
def loss_fn_kd(student_logits, labels, teacher_logits, alpha=0.5, temperature=20.0):
    # ------------TODO-------------
    # Refer to the above formula and finish the loss function for knowkedge distillation using KL divergence loss and CE loss.
    # If you have no idea, please take a look at the provided useful link above.
    loss_ce = F.cross_entropy(student_logits, labels)
    p = F.log_softmax(student_logits / temperature, dim=1)
    q = F.softmax(teacher_logits / temperature, dim=1)
    loss_kl = F.kl_div(p, q, reduction='batchmean', log_target=False)
    loss = alpha * temperature * temperature * loss_kl + (1 - alpha) * loss_ce
    return loss
#     student_T = (student_logits/temperature).softmax(dim=-1)
#     teacher_T = (teacher_logits/temperature).softmax(dim=-1)
#     kl_loss = (teacher_T*(teacher_T.log() - student_T.log())).sum(1).mean() 
#     ce_loss = CE(student_logits, labels)
#     return alpha*(temperature**2)*kl_loss + (1 - alpha)*ce_loss

In [15]:
def use_pretrain():
    student_model.conv1.weight = teacher_model.conv1.weight
    student_model.bn1.weight = teacher_model.bn1.weight
    student_model.bn1.bias = teacher_model.bn1.bias
    student_model.bn1.running_mean = teacher_model.bn1.running_mean
    student_model.bn1.running_var = teacher_model.bn1.running_var
    student_model.conv1.weight.requires_grad = False
    student_model.bn1.weight.requires_grad = False
    student_model.bn1.bias.requires_grad = False
use_pretrain()


class HookTool:
    def __init__(self):
        self.fea = None
    def hook_fun(self, module, fea_in, fea_out):
        self.fea = fea_out
        
def get_feas_by_hook(model, names=['layer1', 'layer2', 'layer3']):
    fea_hooks = []
    for name, module in model.named_modules():
        if name in names:
            cur_hook = HookTool()
            module.register_forward_hook(cur_hook.hook_fun)
            fea_hooks.append(cur_hook)
    return fea_hooks
fea_hooks_teacher = get_feas_by_hook(teacher_model)
fea_hooks_student = get_feas_by_hook(student_model)

def loss_fea_layers(student, teacher):
    loss  = 0
    for i in range(len(student)):
        #loss += (len(student) - i)* (student[i].fea - teacher[i].fea).norm(2, [1, 2, 3]).mean()
        loss += (len(student) - i) * F.smooth_l1_loss(student[i].fea, teacher[i].fea)
    return loss

In [16]:
# choose the loss function by the config
if cfg['loss_fn_type'] == 'CE':
    # For the classification task, we use cross-entropy as the default loss function.
    loss_fn = nn.CrossEntropyLoss() # loss function for simple baseline.

if cfg['loss_fn_type'] == 'KD': # KD stands for knowledge distillation
    loss_fn = loss_fn_kd # implement loss_fn_kd for the report question and the medium baseline.

# You can also adopt other types of knowledge distillation techniques for strong and boss baseline, but use function name other than `loss_fn_kd`
# For example:
# def loss_fn_custom_kd():
#     pass
# if cfg['loss_fn_type'] == 'custom_kd':
#     loss_fn = loss_fn_custom_kd

# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"
log(f"device: {device}")

# The number of training epochs and patience.
n_epochs = cfg['n_epochs']
patience = cfg['patience'] # If no improvement in 'patience' epochs, early stop

device: cuda


### Training
implement training loop for simple baseline, feel free to modify it.

In [17]:
# Initialize a model, and put it on the device specified.
student_model.to(device)
teacher_model.to(device) # MEDIUM BASELINE

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(student_model.parameters(), lr=cfg['lr'], weight_decay=cfg['weight_decay']) 

# Initialize trackers, these are not parameters and should not be changed
stale = 0
best_acc = 0.0

teacher_model.eval()  # MEDIUM BASELINE
for epoch in range(n_epochs):

    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    student_model.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []
    train_lens = []
    percent = (1+epoch)/n_epochs
    
    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)
        #imgs = imgs.half()
        #print(imgs.shape,labels.shape)

        # Forward the data. (Make sure data and model are on the same device.)
        with torch.no_grad():  # MEDIUM BASELINE
            teacher_logits = teacher_model(imgs)  # MEDIUM BASELINE
        
        logits = student_model(imgs)

        # Calculate the cross-entropy loss.
        # We don't need to apply softmax before computing cross-entropy as it is done automatically.
        loss_logits = loss_fn(logits, labels, teacher_logits, alpha=1 - percent*percent) # MEDIUM BASELINE
        loss_fea = loss_fea_layers(fea_hooks_student, fea_hooks_teacher)
        loss = (10*percent*percent) * loss_logits + loss_fea
#         loss = loss_fn(logits, labels, teacher_logits) # MEDIUM BASELINE
#         loss = loss_fn(logits, labels) # SIMPLE BASELINE
        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(student_model.parameters(), max_norm=cfg['grad_norm_max'])

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().sum()

        # Record the loss and accuracy.
        train_batch_len = len(imgs)
        train_loss.append(loss.item() * train_batch_len)
        train_accs.append(acc)
        train_lens.append(train_batch_len)
        
    train_loss = sum(train_loss) / sum(train_lens)
    train_acc = sum(train_accs) / sum(train_lens)

    # Print the information.
    log(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    student_model.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []
    valid_lens = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = student_model(imgs)
            teacher_logits = teacher_model(imgs) # MEDIUM BASELINE

        # We can still compute the loss (but not the gradient).
        loss = loss_fn(logits, labels, teacher_logits) # MEDIUM BASELINE
#         loss = loss_fn(logits, labels) # SIMPLE BASELINE

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().sum()

        # Record the loss and accuracy.
        batch_len = len(imgs)
        valid_loss.append(loss.item() * batch_len)
        valid_accs.append(acc)
        valid_lens.append(batch_len)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / sum(valid_lens)
    valid_acc = sum(valid_accs) / sum(valid_lens)

    # update logs
    
    if valid_acc > best_acc:
        log(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        log(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        log(f"Best model found at epoch {epoch+1}, saving model")
        torch.save(student_model.state_dict(), f"{save_path}/student_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            log(f"No improvment {patience} consecutive epochs, early stopping")
            break
log("Finish training")
log_fw.close()

100%|██████████| 79/79 [00:37<00:00,  2.10it/s]


[ Train | 001/600 ] loss = 3.52468, acc = 0.21795


100%|██████████| 35/35 [00:17<00:00,  2.06it/s]


[ Valid | 001/600 ] loss = 19.43637, acc = 0.26376 -> best
Best model found at epoch 1, saving model


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 002/600 ] loss = 2.90746, acc = 0.30161


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 002/600 ] loss = 18.23230, acc = 0.31498 -> best
Best model found at epoch 2, saving model


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 003/600 ] loss = 2.54642, acc = 0.33463


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 003/600 ] loss = 16.89385, acc = 0.37049 -> best
Best model found at epoch 3, saving model


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 004/600 ] loss = 2.29297, acc = 0.39137


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 004/600 ] loss = 15.87172, acc = 0.44111 -> best
Best model found at epoch 4, saving model


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 005/600 ] loss = 2.10573, acc = 0.43611


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 005/600 ] loss = 15.56465, acc = 0.43208


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 006/600 ] loss = 1.96740, acc = 0.45952


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 006/600 ] loss = 14.07244, acc = 0.49774 -> best
Best model found at epoch 6, saving model


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 007/600 ] loss = 1.86108, acc = 0.49054


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 007/600 ] loss = 13.84191, acc = 0.49481


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 008/600 ] loss = 1.78168, acc = 0.50635


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 008/600 ] loss = 13.13252, acc = 0.50271 -> best
Best model found at epoch 8, saving model


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 009/600 ] loss = 1.71869, acc = 0.52507


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 009/600 ] loss = 12.55227, acc = 0.55077 -> best
Best model found at epoch 9, saving model


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 010/600 ] loss = 1.66822, acc = 0.54308


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 010/600 ] loss = 12.06280, acc = 0.56453 -> best
Best model found at epoch 10, saving model


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 011/600 ] loss = 1.62965, acc = 0.55159


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 011/600 ] loss = 12.57214, acc = 0.57243 -> best
Best model found at epoch 11, saving model


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 012/600 ] loss = 1.60153, acc = 0.55959


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 012/600 ] loss = 12.88229, acc = 0.51828


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 013/600 ] loss = 1.58161, acc = 0.57500


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 013/600 ] loss = 11.83595, acc = 0.56250


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 014/600 ] loss = 1.56674, acc = 0.58181


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 014/600 ] loss = 11.40850, acc = 0.59928 -> best
Best model found at epoch 14, saving model


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 015/600 ] loss = 1.55639, acc = 0.59842


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 015/600 ] loss = 11.39962, acc = 0.61236 -> best
Best model found at epoch 15, saving model


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 016/600 ] loss = 1.55198, acc = 0.59902


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 016/600 ] loss = 11.55053, acc = 0.59567


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 017/600 ] loss = 1.54911, acc = 0.60332


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 017/600 ] loss = 11.57994, acc = 0.59905


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 018/600 ] loss = 1.54920, acc = 0.60723


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 018/600 ] loss = 11.14382, acc = 0.61958 -> best
Best model found at epoch 18, saving model


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 019/600 ] loss = 1.55277, acc = 0.61073


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 019/600 ] loss = 12.08732, acc = 0.56656


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 020/600 ] loss = 1.55491, acc = 0.62324


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 020/600 ] loss = 11.88467, acc = 0.55122


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 021/600 ] loss = 1.55825, acc = 0.62764


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 021/600 ] loss = 10.80978, acc = 0.58055


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 022/600 ] loss = 1.56253, acc = 0.63114


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 022/600 ] loss = 10.66114, acc = 0.62613 -> best
Best model found at epoch 22, saving model


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 023/600 ] loss = 1.56725, acc = 0.63775


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 023/600 ] loss = 10.57180, acc = 0.61079


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 024/600 ] loss = 1.57667, acc = 0.63364


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 024/600 ] loss = 10.27811, acc = 0.61530


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 025/600 ] loss = 1.58190, acc = 0.64385


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 025/600 ] loss = 10.66985, acc = 0.62049


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 026/600 ] loss = 1.58882, acc = 0.65056


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 026/600 ] loss = 11.15198, acc = 0.60379


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 027/600 ] loss = 1.59820, acc = 0.65076


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 027/600 ] loss = 10.44444, acc = 0.60830


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 028/600 ] loss = 1.60958, acc = 0.65186


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 028/600 ] loss = 9.70886, acc = 0.65772 -> best
Best model found at epoch 28, saving model


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 029/600 ] loss = 1.61668, acc = 0.65596


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 029/600 ] loss = 10.42271, acc = 0.63267


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 030/600 ] loss = 1.62511, acc = 0.65566


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 030/600 ] loss = 10.92695, acc = 0.59070


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 031/600 ] loss = 1.63804, acc = 0.66306


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 031/600 ] loss = 11.28215, acc = 0.57536


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 032/600 ] loss = 1.65219, acc = 0.66457


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 032/600 ] loss = 9.74956, acc = 0.61868


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 033/600 ] loss = 1.65540, acc = 0.66947


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 033/600 ] loss = 9.82592, acc = 0.65794 -> best
Best model found at epoch 33, saving model


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 034/600 ] loss = 1.66945, acc = 0.66166


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 034/600 ] loss = 9.49477, acc = 0.66223 -> best
Best model found at epoch 34, saving model


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 035/600 ] loss = 1.68387, acc = 0.66977


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 035/600 ] loss = 10.61790, acc = 0.60356


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 036/600 ] loss = 1.69901, acc = 0.67167


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 036/600 ] loss = 9.36487, acc = 0.64959


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 037/600 ] loss = 1.70174, acc = 0.67948


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 037/600 ] loss = 8.86945, acc = 0.67464 -> best
Best model found at epoch 37, saving model


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 038/600 ] loss = 1.71481, acc = 0.68208


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 038/600 ] loss = 8.83928, acc = 0.68908 -> best
Best model found at epoch 38, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 039/600 ] loss = 1.72940, acc = 0.68168


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 039/600 ] loss = 9.34838, acc = 0.65501


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 040/600 ] loss = 1.74198, acc = 0.67978


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 040/600 ] loss = 9.14595, acc = 0.67080


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 041/600 ] loss = 1.75988, acc = 0.68468


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 041/600 ] loss = 9.28577, acc = 0.66674


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 042/600 ] loss = 1.77210, acc = 0.68908


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 042/600 ] loss = 9.06265, acc = 0.68705


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 043/600 ] loss = 1.79142, acc = 0.68448


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 043/600 ] loss = 8.94878, acc = 0.65185


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 044/600 ] loss = 1.80046, acc = 0.69439


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 044/600 ] loss = 9.65070, acc = 0.60988


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 045/600 ] loss = 1.81499, acc = 0.68948


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 045/600 ] loss = 8.84697, acc = 0.68389


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 046/600 ] loss = 1.83709, acc = 0.68778


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 046/600 ] loss = 9.31504, acc = 0.66178


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 047/600 ] loss = 1.84212, acc = 0.69899


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 047/600 ] loss = 8.98552, acc = 0.66471


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 048/600 ] loss = 1.85615, acc = 0.69839


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 048/600 ] loss = 8.89069, acc = 0.68096


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 049/600 ] loss = 1.87494, acc = 0.69609


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 049/600 ] loss = 8.57177, acc = 0.69833 -> best
Best model found at epoch 49, saving model


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 050/600 ] loss = 1.89499, acc = 0.69949


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 050/600 ] loss = 8.75354, acc = 0.69562


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 051/600 ] loss = 1.90478, acc = 0.70659


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 051/600 ] loss = 9.30534, acc = 0.65343


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 052/600 ] loss = 1.91697, acc = 0.70810


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 052/600 ] loss = 8.87448, acc = 0.67532


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 053/600 ] loss = 1.94764, acc = 0.69749


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 053/600 ] loss = 8.07035, acc = 0.71029 -> best
Best model found at epoch 53, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 054/600 ] loss = 1.96978, acc = 0.69899


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 054/600 ] loss = 8.44554, acc = 0.68818


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 055/600 ] loss = 1.97330, acc = 0.70730


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 055/600 ] loss = 8.63415, acc = 0.69269


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 056/600 ] loss = 1.98711, acc = 0.70709


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 056/600 ] loss = 7.91422, acc = 0.70916


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 057/600 ] loss = 2.00144, acc = 0.71110


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 057/600 ] loss = 8.78330, acc = 0.66358


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 058/600 ] loss = 2.03022, acc = 0.70720


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 058/600 ] loss = 8.02413, acc = 0.70690


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 059/600 ] loss = 2.04020, acc = 0.71550


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 059/600 ] loss = 8.24742, acc = 0.70352


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 060/600 ] loss = 2.06893, acc = 0.71380


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 060/600 ] loss = 8.47070, acc = 0.68976


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 061/600 ] loss = 2.07981, acc = 0.71450


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 061/600 ] loss = 8.07100, acc = 0.71345 -> best
Best model found at epoch 61, saving model


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 062/600 ] loss = 2.08735, acc = 0.71880


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 062/600 ] loss = 8.54513, acc = 0.69134


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 063/600 ] loss = 2.12127, acc = 0.71710


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 063/600 ] loss = 8.14338, acc = 0.69810


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 064/600 ] loss = 2.12511, acc = 0.72090


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 064/600 ] loss = 8.52066, acc = 0.69337


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 065/600 ] loss = 2.14477, acc = 0.72010


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 065/600 ] loss = 7.64034, acc = 0.72495 -> best
Best model found at epoch 65, saving model


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 066/600 ] loss = 2.16638, acc = 0.71950


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 066/600 ] loss = 7.99176, acc = 0.70690


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 067/600 ] loss = 2.17143, acc = 0.72301


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 067/600 ] loss = 8.49115, acc = 0.70736


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 068/600 ] loss = 2.22553, acc = 0.71790


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 068/600 ] loss = 7.98075, acc = 0.70307


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 069/600 ] loss = 2.22376, acc = 0.72641


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 069/600 ] loss = 8.06408, acc = 0.71661


100%|██████████| 79/79 [00:31<00:00,  2.50it/s]


[ Train | 070/600 ] loss = 2.23863, acc = 0.72751


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 070/600 ] loss = 7.60839, acc = 0.72902 -> best
Best model found at epoch 70, saving model


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 071/600 ] loss = 2.25149, acc = 0.73071


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 071/600 ] loss = 7.64477, acc = 0.71706


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 072/600 ] loss = 2.28790, acc = 0.72331


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 072/600 ] loss = 7.80280, acc = 0.70916


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 073/600 ] loss = 2.30148, acc = 0.73271


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 073/600 ] loss = 8.19869, acc = 0.70442


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 074/600 ] loss = 2.31278, acc = 0.72501


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 074/600 ] loss = 7.75424, acc = 0.71209


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 075/600 ] loss = 2.34047, acc = 0.73041


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 075/600 ] loss = 7.41274, acc = 0.74188 -> best
Best model found at epoch 75, saving model


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 076/600 ] loss = 2.36972, acc = 0.73161


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 076/600 ] loss = 7.67886, acc = 0.70758


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 077/600 ] loss = 2.39049, acc = 0.72731


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 077/600 ] loss = 7.37527, acc = 0.73308


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 078/600 ] loss = 2.40929, acc = 0.72921


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 078/600 ] loss = 7.60998, acc = 0.73037


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 079/600 ] loss = 2.43228, acc = 0.73271


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 079/600 ] loss = 7.60989, acc = 0.71977


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 080/600 ] loss = 2.44253, acc = 0.73912


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 080/600 ] loss = 7.75509, acc = 0.71232


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 081/600 ] loss = 2.46395, acc = 0.73982


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 081/600 ] loss = 7.90125, acc = 0.70939


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 082/600 ] loss = 2.50921, acc = 0.73722


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 082/600 ] loss = 7.79317, acc = 0.70668


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 083/600 ] loss = 2.50434, acc = 0.73822


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 083/600 ] loss = 7.41343, acc = 0.73353


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 084/600 ] loss = 2.54499, acc = 0.73481


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 084/600 ] loss = 7.23878, acc = 0.73601


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 085/600 ] loss = 2.53883, acc = 0.73792


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 085/600 ] loss = 7.65930, acc = 0.71300


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 086/600 ] loss = 2.59421, acc = 0.73582


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 086/600 ] loss = 7.18683, acc = 0.74255 -> best
Best model found at epoch 86, saving model


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 087/600 ] loss = 2.61270, acc = 0.74112


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 087/600 ] loss = 7.53974, acc = 0.73263


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 088/600 ] loss = 2.64284, acc = 0.73561


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 088/600 ] loss = 7.16470, acc = 0.74255


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 089/600 ] loss = 2.64400, acc = 0.74182


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 089/600 ] loss = 7.63687, acc = 0.72608


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 090/600 ] loss = 2.67582, acc = 0.74292


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 090/600 ] loss = 7.16326, acc = 0.74346 -> best
Best model found at epoch 90, saving model


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 091/600 ] loss = 2.68165, acc = 0.74572


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 091/600 ] loss = 7.19709, acc = 0.74030


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 092/600 ] loss = 2.72729, acc = 0.74432


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 092/600 ] loss = 7.51232, acc = 0.71142


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 093/600 ] loss = 2.71959, acc = 0.74262


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 093/600 ] loss = 7.19963, acc = 0.73330


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 094/600 ] loss = 2.76024, acc = 0.74702


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 094/600 ] loss = 7.08284, acc = 0.74639 -> best
Best model found at epoch 94, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 095/600 ] loss = 2.80441, acc = 0.74482


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 095/600 ] loss = 7.83775, acc = 0.72225


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 096/600 ] loss = 2.82525, acc = 0.74822


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 096/600 ] loss = 7.09043, acc = 0.73940


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 097/600 ] loss = 2.84547, acc = 0.74982


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 097/600 ] loss = 7.60487, acc = 0.69630


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 098/600 ] loss = 2.86068, acc = 0.74732


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 098/600 ] loss = 7.11912, acc = 0.74368


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 099/600 ] loss = 2.88113, acc = 0.75013


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 099/600 ] loss = 6.94063, acc = 0.74910 -> best
Best model found at epoch 99, saving model


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 100/600 ] loss = 2.93015, acc = 0.75073


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 100/600 ] loss = 7.33893, acc = 0.72744


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 101/600 ] loss = 2.95110, acc = 0.75053


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 101/600 ] loss = 7.25645, acc = 0.73150


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 102/600 ] loss = 2.96813, acc = 0.75193


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 102/600 ] loss = 7.42134, acc = 0.71119


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 103/600 ] loss = 2.98829, acc = 0.75403


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 103/600 ] loss = 6.98994, acc = 0.74030


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 104/600 ] loss = 3.00859, acc = 0.75473


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 104/600 ] loss = 7.23371, acc = 0.72902


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 105/600 ] loss = 3.06106, acc = 0.75043


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 105/600 ] loss = 7.14111, acc = 0.73172


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 106/600 ] loss = 3.03796, acc = 0.75513


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 106/600 ] loss = 7.12593, acc = 0.73375


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 107/600 ] loss = 3.08993, acc = 0.74962


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 107/600 ] loss = 7.01002, acc = 0.74391


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 108/600 ] loss = 3.09132, acc = 0.75133


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 108/600 ] loss = 7.25348, acc = 0.74052


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 109/600 ] loss = 3.14164, acc = 0.75703


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 109/600 ] loss = 6.97390, acc = 0.74413


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 110/600 ] loss = 3.16092, acc = 0.75873


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 110/600 ] loss = 6.93202, acc = 0.74346


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 111/600 ] loss = 3.17281, acc = 0.76103


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 111/600 ] loss = 7.07251, acc = 0.73218


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 112/600 ] loss = 3.20773, acc = 0.76193


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 112/600 ] loss = 6.81036, acc = 0.75338 -> best
Best model found at epoch 112, saving model


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 113/600 ] loss = 3.21514, acc = 0.76153


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 113/600 ] loss = 6.92622, acc = 0.75045


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 114/600 ] loss = 3.25008, acc = 0.75813


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 114/600 ] loss = 6.82138, acc = 0.75316


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 115/600 ] loss = 3.27856, acc = 0.76353


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 115/600 ] loss = 7.11306, acc = 0.71751


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 116/600 ] loss = 3.33333, acc = 0.76003


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 116/600 ] loss = 6.68492, acc = 0.76421 -> best
Best model found at epoch 116, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 117/600 ] loss = 3.33650, acc = 0.76213


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 117/600 ] loss = 6.82225, acc = 0.75880


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 118/600 ] loss = 3.34776, acc = 0.76444


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 118/600 ] loss = 6.94660, acc = 0.75068


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 119/600 ] loss = 3.42086, acc = 0.76173


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 119/600 ] loss = 6.65925, acc = 0.76579 -> best
Best model found at epoch 119, saving model


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 120/600 ] loss = 3.44237, acc = 0.76544


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 120/600 ] loss = 6.98350, acc = 0.75699


100%|██████████| 79/79 [00:31<00:00,  2.50it/s]


[ Train | 121/600 ] loss = 3.45704, acc = 0.75863


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 121/600 ] loss = 6.81520, acc = 0.76782 -> best
Best model found at epoch 121, saving model


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 122/600 ] loss = 3.47714, acc = 0.76413


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 122/600 ] loss = 6.85939, acc = 0.73579


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 123/600 ] loss = 3.51481, acc = 0.76273


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 123/600 ] loss = 7.01371, acc = 0.73962


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 124/600 ] loss = 3.54260, acc = 0.76003


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 124/600 ] loss = 7.28000, acc = 0.73443


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 125/600 ] loss = 3.58167, acc = 0.76454


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 125/600 ] loss = 6.95868, acc = 0.74458


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 126/600 ] loss = 3.59108, acc = 0.76504


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 126/600 ] loss = 7.09576, acc = 0.73488


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 127/600 ] loss = 3.62451, acc = 0.76894


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 127/600 ] loss = 6.53269, acc = 0.75812


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 128/600 ] loss = 3.62951, acc = 0.76674


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 128/600 ] loss = 6.96748, acc = 0.74910


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 129/600 ] loss = 3.66169, acc = 0.76854


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 129/600 ] loss = 6.47658, acc = 0.75993


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 130/600 ] loss = 3.70439, acc = 0.76654


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 130/600 ] loss = 6.53045, acc = 0.75451


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 131/600 ] loss = 3.73418, acc = 0.76764


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 131/600 ] loss = 6.50411, acc = 0.75835


100%|██████████| 79/79 [00:33<00:00,  2.39it/s]


[ Train | 132/600 ] loss = 3.76670, acc = 0.77064


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 132/600 ] loss = 6.88377, acc = 0.73353


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 133/600 ] loss = 3.79917, acc = 0.77204


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 133/600 ] loss = 7.00608, acc = 0.74323


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 134/600 ] loss = 3.80625, acc = 0.76754


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 134/600 ] loss = 6.44165, acc = 0.76805 -> best
Best model found at epoch 134, saving model


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 135/600 ] loss = 3.84321, acc = 0.76864


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 135/600 ] loss = 6.69632, acc = 0.74052


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 136/600 ] loss = 3.90372, acc = 0.77214


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 136/600 ] loss = 6.58241, acc = 0.75880


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 137/600 ] loss = 3.92424, acc = 0.76804


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 137/600 ] loss = 6.44082, acc = 0.78023 -> best
Best model found at epoch 137, saving model


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 138/600 ] loss = 3.91703, acc = 0.77084


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 138/600 ] loss = 6.99985, acc = 0.74865


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 139/600 ] loss = 3.97655, acc = 0.77464


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 139/600 ] loss = 7.03928, acc = 0.72383


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 140/600 ] loss = 4.01780, acc = 0.77674


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 140/600 ] loss = 6.85997, acc = 0.73353


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 141/600 ] loss = 3.98751, acc = 0.77334


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 141/600 ] loss = 6.53207, acc = 0.76715


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 142/600 ] loss = 4.01312, acc = 0.77354


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 142/600 ] loss = 6.43034, acc = 0.76760


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 143/600 ] loss = 4.06328, acc = 0.77234


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 143/600 ] loss = 6.26887, acc = 0.77978


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 144/600 ] loss = 4.12807, acc = 0.76974


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 144/600 ] loss = 6.66068, acc = 0.76060


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 145/600 ] loss = 4.14042, acc = 0.76964


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 145/600 ] loss = 6.52613, acc = 0.77098


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 146/600 ] loss = 4.15154, acc = 0.77424


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 146/600 ] loss = 6.42725, acc = 0.77392


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 147/600 ] loss = 4.16293, acc = 0.77804


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 147/600 ] loss = 6.49088, acc = 0.75609


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 148/600 ] loss = 4.20383, acc = 0.78005


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 148/600 ] loss = 6.76446, acc = 0.76264


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 149/600 ] loss = 4.25558, acc = 0.77614


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 149/600 ] loss = 6.38721, acc = 0.77798


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 150/600 ] loss = 4.30351, acc = 0.77334


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 150/600 ] loss = 6.66542, acc = 0.75925


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 151/600 ] loss = 4.34461, acc = 0.77404


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 151/600 ] loss = 6.44642, acc = 0.76625


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 152/600 ] loss = 4.35322, acc = 0.78115


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 152/600 ] loss = 6.49561, acc = 0.76828


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 153/600 ] loss = 4.40997, acc = 0.77664


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 153/600 ] loss = 6.67794, acc = 0.76331


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 154/600 ] loss = 4.41732, acc = 0.77404


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 154/600 ] loss = 6.58456, acc = 0.76331


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 155/600 ] loss = 4.43881, acc = 0.77885


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 155/600 ] loss = 6.39398, acc = 0.76737


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 156/600 ] loss = 4.45384, acc = 0.77564


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 156/600 ] loss = 6.45103, acc = 0.76850


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 157/600 ] loss = 4.49974, acc = 0.77915


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 157/600 ] loss = 6.41185, acc = 0.75451


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 158/600 ] loss = 4.56014, acc = 0.77985


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 158/600 ] loss = 6.78168, acc = 0.74707


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 159/600 ] loss = 4.54145, acc = 0.78265


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 159/600 ] loss = 6.54692, acc = 0.76354


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 160/600 ] loss = 4.63431, acc = 0.77975


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 160/600 ] loss = 6.53494, acc = 0.76647


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 161/600 ] loss = 4.62919, acc = 0.77895


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 161/600 ] loss = 6.54540, acc = 0.74865


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 162/600 ] loss = 4.62121, acc = 0.77975


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 162/600 ] loss = 6.44482, acc = 0.76986


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 163/600 ] loss = 4.70083, acc = 0.78205


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 163/600 ] loss = 6.55313, acc = 0.75045


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 164/600 ] loss = 4.72518, acc = 0.78015


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 164/600 ] loss = 6.61391, acc = 0.73804


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 165/600 ] loss = 4.74819, acc = 0.78455


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 165/600 ] loss = 6.50961, acc = 0.76873


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 166/600 ] loss = 4.78128, acc = 0.78085


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 166/600 ] loss = 6.38280, acc = 0.76354


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 167/600 ] loss = 4.84775, acc = 0.78605


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 167/600 ] loss = 6.50903, acc = 0.75068


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 168/600 ] loss = 4.84907, acc = 0.78425


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 168/600 ] loss = 6.15868, acc = 0.78001


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 169/600 ] loss = 4.87706, acc = 0.78645


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 169/600 ] loss = 6.77788, acc = 0.74075


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 170/600 ] loss = 4.92091, acc = 0.78755


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 170/600 ] loss = 6.56288, acc = 0.77189


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 171/600 ] loss = 4.91390, acc = 0.78675


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 171/600 ] loss = 6.16652, acc = 0.77933


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 172/600 ] loss = 4.97289, acc = 0.78685


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 172/600 ] loss = 6.41835, acc = 0.77324


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 173/600 ] loss = 5.02101, acc = 0.78585


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 173/600 ] loss = 6.54977, acc = 0.75654


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 174/600 ] loss = 5.08743, acc = 0.78455


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 174/600 ] loss = 6.23147, acc = 0.77708


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 175/600 ] loss = 5.08227, acc = 0.78445


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 175/600 ] loss = 6.22425, acc = 0.78046 -> best
Best model found at epoch 175, saving model


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 176/600 ] loss = 5.08637, acc = 0.78575


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 176/600 ] loss = 6.52753, acc = 0.75654


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 177/600 ] loss = 5.15385, acc = 0.78315


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 177/600 ] loss = 6.31534, acc = 0.78294 -> best
Best model found at epoch 177, saving model


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 178/600 ] loss = 5.14888, acc = 0.78675


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 178/600 ] loss = 6.06999, acc = 0.77866


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 179/600 ] loss = 5.16858, acc = 0.78745


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 179/600 ] loss = 6.05812, acc = 0.77730


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 180/600 ] loss = 5.15222, acc = 0.79316


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 180/600 ] loss = 6.34736, acc = 0.76647


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 181/600 ] loss = 5.26039, acc = 0.78665


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 181/600 ] loss = 6.16378, acc = 0.78858 -> best
Best model found at epoch 181, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 182/600 ] loss = 5.30501, acc = 0.79145


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 182/600 ] loss = 6.29692, acc = 0.76940


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 183/600 ] loss = 5.37278, acc = 0.79105


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 183/600 ] loss = 6.19761, acc = 0.77482


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 184/600 ] loss = 5.36056, acc = 0.79306


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 184/600 ] loss = 6.07690, acc = 0.77550


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 185/600 ] loss = 5.36014, acc = 0.78895


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 185/600 ] loss = 6.03429, acc = 0.78700


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 186/600 ] loss = 5.39946, acc = 0.79075


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 186/600 ] loss = 6.54533, acc = 0.77211


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 187/600 ] loss = 5.46838, acc = 0.78955


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 187/600 ] loss = 6.11546, acc = 0.77956


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 188/600 ] loss = 5.52893, acc = 0.78995


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 188/600 ] loss = 6.12425, acc = 0.76670


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 189/600 ] loss = 5.51941, acc = 0.79786


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 189/600 ] loss = 5.99593, acc = 0.77572


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 190/600 ] loss = 5.57581, acc = 0.78385


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 190/600 ] loss = 6.11921, acc = 0.79197 -> best
Best model found at epoch 190, saving model


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 191/600 ] loss = 5.62111, acc = 0.79035


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 191/600 ] loss = 6.15782, acc = 0.76579


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 192/600 ] loss = 5.61452, acc = 0.79596


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 192/600 ] loss = 6.32179, acc = 0.77437


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 193/600 ] loss = 5.62504, acc = 0.79556


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 193/600 ] loss = 6.32095, acc = 0.76986


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 194/600 ] loss = 5.68719, acc = 0.80006


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 194/600 ] loss = 6.00062, acc = 0.78700


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 195/600 ] loss = 5.73706, acc = 0.79506


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 195/600 ] loss = 6.20101, acc = 0.76895


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 196/600 ] loss = 5.77413, acc = 0.79666


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 196/600 ] loss = 6.20420, acc = 0.77866


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 197/600 ] loss = 5.72544, acc = 0.80176


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 197/600 ] loss = 6.13598, acc = 0.77640


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 198/600 ] loss = 5.83259, acc = 0.79606


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 198/600 ] loss = 6.05148, acc = 0.78046


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 199/600 ] loss = 5.88929, acc = 0.79576


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 199/600 ] loss = 6.21961, acc = 0.76760


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 200/600 ] loss = 5.90469, acc = 0.79626


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 200/600 ] loss = 6.10109, acc = 0.78294


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 201/600 ] loss = 5.92927, acc = 0.79616


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 201/600 ] loss = 6.35782, acc = 0.75745


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 202/600 ] loss = 5.97574, acc = 0.79546


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 202/600 ] loss = 6.16800, acc = 0.78949


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 203/600 ] loss = 6.01343, acc = 0.79566


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 203/600 ] loss = 5.93493, acc = 0.78475


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 204/600 ] loss = 6.08813, acc = 0.79406


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 204/600 ] loss = 6.20056, acc = 0.76489


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 205/600 ] loss = 6.03010, acc = 0.79966


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 205/600 ] loss = 6.05461, acc = 0.78272


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 206/600 ] loss = 6.09584, acc = 0.79746


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 206/600 ] loss = 6.04987, acc = 0.77820


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 207/600 ] loss = 6.12981, acc = 0.79626


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 207/600 ] loss = 6.19525, acc = 0.77708


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 208/600 ] loss = 6.19280, acc = 0.79816


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 208/600 ] loss = 6.11566, acc = 0.76805


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 209/600 ] loss = 6.17753, acc = 0.79926


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 209/600 ] loss = 5.86835, acc = 0.78339


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 210/600 ] loss = 6.22517, acc = 0.80146


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 210/600 ] loss = 5.87768, acc = 0.78994


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 211/600 ] loss = 6.29873, acc = 0.79886


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 211/600 ] loss = 6.21919, acc = 0.77166


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 212/600 ] loss = 6.29193, acc = 0.79726


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 212/600 ] loss = 5.98030, acc = 0.77392


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 213/600 ] loss = 6.36368, acc = 0.80146


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 213/600 ] loss = 5.98390, acc = 0.78949


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 214/600 ] loss = 6.43197, acc = 0.80616


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 214/600 ] loss = 5.83721, acc = 0.78520


100%|██████████| 79/79 [00:32<00:00,  2.46it/s]


[ Train | 215/600 ] loss = 6.39416, acc = 0.80286


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 215/600 ] loss = 5.97085, acc = 0.78204


100%|██████████| 79/79 [00:32<00:00,  2.46it/s]


[ Train | 216/600 ] loss = 6.48310, acc = 0.80116


100%|██████████| 35/35 [00:14<00:00,  2.47it/s]


[ Valid | 216/600 ] loss = 6.04931, acc = 0.79535 -> best
Best model found at epoch 216, saving model


100%|██████████| 79/79 [00:32<00:00,  2.40it/s]


[ Train | 217/600 ] loss = 6.49234, acc = 0.80336


100%|██████████| 35/35 [00:13<00:00,  2.51it/s]


[ Valid | 217/600 ] loss = 5.83336, acc = 0.78430


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 218/600 ] loss = 6.48836, acc = 0.80827


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 218/600 ] loss = 5.99222, acc = 0.78700


100%|██████████| 79/79 [00:31<00:00,  2.49it/s]


[ Train | 219/600 ] loss = 6.51237, acc = 0.80566


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 219/600 ] loss = 6.11771, acc = 0.77866


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 220/600 ] loss = 6.60987, acc = 0.79996


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 220/600 ] loss = 6.13419, acc = 0.77324


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 221/600 ] loss = 6.63384, acc = 0.80066


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 221/600 ] loss = 6.06963, acc = 0.79016


100%|██████████| 79/79 [00:31<00:00,  2.48it/s]


[ Train | 222/600 ] loss = 6.61811, acc = 0.80126


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 222/600 ] loss = 6.23868, acc = 0.79129


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 223/600 ] loss = 6.71473, acc = 0.79956


100%|██████████| 35/35 [00:14<00:00,  2.49it/s]


[ Valid | 223/600 ] loss = 5.92166, acc = 0.78768


100%|██████████| 79/79 [00:31<00:00,  2.48it/s]


[ Train | 224/600 ] loss = 6.77573, acc = 0.80656


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 224/600 ] loss = 5.90067, acc = 0.79174


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 225/600 ] loss = 6.77653, acc = 0.80216


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 225/600 ] loss = 6.18632, acc = 0.78023


100%|██████████| 79/79 [00:32<00:00,  2.42it/s]


[ Train | 226/600 ] loss = 6.81264, acc = 0.80466


100%|██████████| 35/35 [00:13<00:00,  2.52it/s]


[ Valid | 226/600 ] loss = 5.85937, acc = 0.77978


100%|██████████| 79/79 [00:31<00:00,  2.49it/s]


[ Train | 227/600 ] loss = 6.80146, acc = 0.80706


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 227/600 ] loss = 5.69673, acc = 0.79671 -> best
Best model found at epoch 227, saving model


100%|██████████| 79/79 [00:31<00:00,  2.48it/s]


[ Train | 228/600 ] loss = 6.92466, acc = 0.81047


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 228/600 ] loss = 5.87306, acc = 0.79716 -> best
Best model found at epoch 228, saving model


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 229/600 ] loss = 6.88854, acc = 0.80496


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 229/600 ] loss = 5.86592, acc = 0.78700


100%|██████████| 79/79 [00:31<00:00,  2.50it/s]


[ Train | 230/600 ] loss = 7.01172, acc = 0.80236


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 230/600 ] loss = 6.14289, acc = 0.78136


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 231/600 ] loss = 6.94665, acc = 0.80296


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 231/600 ] loss = 6.00239, acc = 0.78069


100%|██████████| 79/79 [00:31<00:00,  2.50it/s]


[ Train | 232/600 ] loss = 7.07322, acc = 0.80987


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 232/600 ] loss = 5.97914, acc = 0.78926


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 233/600 ] loss = 7.07730, acc = 0.80486


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 233/600 ] loss = 5.86268, acc = 0.78204


100%|██████████| 79/79 [00:31<00:00,  2.48it/s]


[ Train | 234/600 ] loss = 7.07188, acc = 0.80416


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 234/600 ] loss = 5.99284, acc = 0.77076


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 235/600 ] loss = 7.11251, acc = 0.80626


100%|██████████| 35/35 [00:14<00:00,  2.49it/s]


[ Valid | 235/600 ] loss = 5.95337, acc = 0.79648


100%|██████████| 79/79 [00:31<00:00,  2.49it/s]


[ Train | 236/600 ] loss = 7.11464, acc = 0.80967


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 236/600 ] loss = 5.81422, acc = 0.79219


100%|██████████| 79/79 [00:31<00:00,  2.48it/s]


[ Train | 237/600 ] loss = 7.13586, acc = 0.80727


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 237/600 ] loss = 5.81490, acc = 0.78949


100%|██████████| 79/79 [00:31<00:00,  2.48it/s]


[ Train | 238/600 ] loss = 7.18176, acc = 0.80897


100%|██████████| 35/35 [00:13<00:00,  2.52it/s]


[ Valid | 238/600 ] loss = 6.14197, acc = 0.79242


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 239/600 ] loss = 7.26804, acc = 0.80827


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 239/600 ] loss = 5.87036, acc = 0.78249


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 240/600 ] loss = 7.22534, acc = 0.80797


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 240/600 ] loss = 5.80227, acc = 0.78858


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 241/600 ] loss = 7.32835, acc = 0.80606


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 241/600 ] loss = 5.92882, acc = 0.79716


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 242/600 ] loss = 7.35300, acc = 0.80847


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 242/600 ] loss = 6.03223, acc = 0.77347


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 243/600 ] loss = 7.36165, acc = 0.81287


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 243/600 ] loss = 5.76014, acc = 0.78881


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 244/600 ] loss = 7.43460, acc = 0.81157


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 244/600 ] loss = 6.08019, acc = 0.79806 -> best
Best model found at epoch 244, saving model


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 245/600 ] loss = 7.47962, acc = 0.80717


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 245/600 ] loss = 6.07439, acc = 0.77098


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 246/600 ] loss = 7.50846, acc = 0.81467


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 246/600 ] loss = 6.15717, acc = 0.78452


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 247/600 ] loss = 7.57714, acc = 0.81107


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 247/600 ] loss = 5.88413, acc = 0.79084


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 248/600 ] loss = 7.65031, acc = 0.80917


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 248/600 ] loss = 5.79004, acc = 0.79490


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 249/600 ] loss = 7.56612, acc = 0.81157


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 249/600 ] loss = 5.88821, acc = 0.79445


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 250/600 ] loss = 7.65697, acc = 0.81177


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 250/600 ] loss = 5.89118, acc = 0.79377


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 251/600 ] loss = 7.65595, acc = 0.80837


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 251/600 ] loss = 5.67195, acc = 0.79986 -> best
Best model found at epoch 251, saving model


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 252/600 ] loss = 7.59627, acc = 0.81267


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 252/600 ] loss = 5.71251, acc = 0.79513


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 253/600 ] loss = 7.72259, acc = 0.81447


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 253/600 ] loss = 5.61116, acc = 0.79558


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 254/600 ] loss = 7.76990, acc = 0.81457


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 254/600 ] loss = 5.84322, acc = 0.79513


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 255/600 ] loss = 7.76248, acc = 0.81327


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 255/600 ] loss = 5.72555, acc = 0.79806


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 256/600 ] loss = 7.78411, acc = 0.80997


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 256/600 ] loss = 5.82533, acc = 0.78407


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 257/600 ] loss = 7.89924, acc = 0.81187


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 257/600 ] loss = 5.99859, acc = 0.78294


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 258/600 ] loss = 7.86580, acc = 0.81117


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 258/600 ] loss = 5.69847, acc = 0.78949


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 259/600 ] loss = 7.89657, acc = 0.81107


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 259/600 ] loss = 5.82237, acc = 0.78475


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 260/600 ] loss = 7.95026, acc = 0.81527


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 260/600 ] loss = 5.67843, acc = 0.79964


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 261/600 ] loss = 8.03745, acc = 0.81237


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 261/600 ] loss = 5.68263, acc = 0.79129


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 262/600 ] loss = 7.98781, acc = 0.81767


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 262/600 ] loss = 5.97544, acc = 0.77708


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 263/600 ] loss = 8.08848, acc = 0.81487


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 263/600 ] loss = 5.67852, acc = 0.80483 -> best
Best model found at epoch 263, saving model


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 264/600 ] loss = 8.00518, acc = 0.81927


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 264/600 ] loss = 5.95036, acc = 0.77911


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 265/600 ] loss = 8.14137, acc = 0.81707


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 265/600 ] loss = 5.83742, acc = 0.78881


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 266/600 ] loss = 8.20322, acc = 0.82178


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 266/600 ] loss = 5.60144, acc = 0.80257


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 267/600 ] loss = 8.15635, acc = 0.81057


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 267/600 ] loss = 5.75097, acc = 0.79490


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 268/600 ] loss = 8.21224, acc = 0.81807


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 268/600 ] loss = 5.63695, acc = 0.79671


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 269/600 ] loss = 8.31623, acc = 0.82087


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 269/600 ] loss = 5.81163, acc = 0.79400


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 270/600 ] loss = 8.29946, acc = 0.81617


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 270/600 ] loss = 5.50842, acc = 0.80144


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 271/600 ] loss = 8.33156, acc = 0.81807


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 271/600 ] loss = 5.69538, acc = 0.78362


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 272/600 ] loss = 8.34848, acc = 0.82107


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 272/600 ] loss = 5.73634, acc = 0.79422


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 273/600 ] loss = 8.39551, acc = 0.81757


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 273/600 ] loss = 5.84019, acc = 0.78836


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 274/600 ] loss = 8.38647, acc = 0.81987


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 274/600 ] loss = 5.61594, acc = 0.79919


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 275/600 ] loss = 8.40588, acc = 0.81817


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 275/600 ] loss = 5.55506, acc = 0.80302


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 276/600 ] loss = 8.48797, acc = 0.81617


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 276/600 ] loss = 5.72325, acc = 0.80280


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 277/600 ] loss = 8.60973, acc = 0.81537


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 277/600 ] loss = 5.84723, acc = 0.77234


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 278/600 ] loss = 8.59634, acc = 0.81957


100%|██████████| 35/35 [00:14<00:00,  2.47it/s]


[ Valid | 278/600 ] loss = 5.63044, acc = 0.80551 -> best
Best model found at epoch 278, saving model


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 279/600 ] loss = 8.65724, acc = 0.81587


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 279/600 ] loss = 5.69977, acc = 0.79648


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 280/600 ] loss = 8.58692, acc = 0.82057


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 280/600 ] loss = 5.54287, acc = 0.80054


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 281/600 ] loss = 8.68823, acc = 0.81857


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 281/600 ] loss = 5.91360, acc = 0.79986


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 282/600 ] loss = 8.64696, acc = 0.81887


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 282/600 ] loss = 5.62051, acc = 0.80663 -> best
Best model found at epoch 282, saving model


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 283/600 ] loss = 8.69329, acc = 0.81597


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 283/600 ] loss = 5.80284, acc = 0.79783


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 284/600 ] loss = 8.79162, acc = 0.81717


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 284/600 ] loss = 5.85943, acc = 0.80190


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 285/600 ] loss = 8.79165, acc = 0.82017


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 285/600 ] loss = 5.61974, acc = 0.79648


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 286/600 ] loss = 8.85493, acc = 0.81827


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 286/600 ] loss = 5.64904, acc = 0.79513


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 287/600 ] loss = 8.88420, acc = 0.82027


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 287/600 ] loss = 5.56523, acc = 0.79693


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 288/600 ] loss = 8.90906, acc = 0.82308


100%|██████████| 35/35 [00:14<00:00,  2.49it/s]


[ Valid | 288/600 ] loss = 5.78895, acc = 0.79287


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 289/600 ] loss = 8.93806, acc = 0.81817


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 289/600 ] loss = 5.44969, acc = 0.80370


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 290/600 ] loss = 8.96135, acc = 0.82308


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 290/600 ] loss = 5.55178, acc = 0.80618


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 291/600 ] loss = 8.96448, acc = 0.82268


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 291/600 ] loss = 5.63294, acc = 0.79671


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 292/600 ] loss = 9.13790, acc = 0.81917


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 292/600 ] loss = 5.54112, acc = 0.80821 -> best
Best model found at epoch 292, saving model


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 293/600 ] loss = 8.93587, acc = 0.81677


100%|██████████| 35/35 [00:12<00:00,  2.81it/s]


[ Valid | 293/600 ] loss = 5.60772, acc = 0.80460


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 294/600 ] loss = 9.15930, acc = 0.82488


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 294/600 ] loss = 5.57650, acc = 0.79693


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 295/600 ] loss = 9.09760, acc = 0.82077


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 295/600 ] loss = 5.64016, acc = 0.80979 -> best
Best model found at epoch 295, saving model


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 296/600 ] loss = 9.20347, acc = 0.82438


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 296/600 ] loss = 5.46211, acc = 0.80505


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 297/600 ] loss = 9.20639, acc = 0.82208


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 297/600 ] loss = 5.77593, acc = 0.79625


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 298/600 ] loss = 9.24725, acc = 0.82518


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 298/600 ] loss = 5.46505, acc = 0.80754


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 299/600 ] loss = 9.27962, acc = 0.82228


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 299/600 ] loss = 5.49166, acc = 0.81611 -> best
Best model found at epoch 299, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 300/600 ] loss = 9.26626, acc = 0.82388


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 300/600 ] loss = 5.52601, acc = 0.81092


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 301/600 ] loss = 9.35353, acc = 0.82468


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 301/600 ] loss = 5.65152, acc = 0.80054


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 302/600 ] loss = 9.31484, acc = 0.82378


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 302/600 ] loss = 5.57539, acc = 0.81543


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 303/600 ] loss = 9.32704, acc = 0.83138


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 303/600 ] loss = 5.50173, acc = 0.80054


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 304/600 ] loss = 9.40816, acc = 0.82648


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 304/600 ] loss = 5.64786, acc = 0.80889


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 305/600 ] loss = 9.40017, acc = 0.82918


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 305/600 ] loss = 5.61538, acc = 0.79851


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 306/600 ] loss = 9.40556, acc = 0.83028


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 306/600 ] loss = 5.56884, acc = 0.79716


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 307/600 ] loss = 9.56940, acc = 0.83038


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 307/600 ] loss = 5.82307, acc = 0.79197


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 308/600 ] loss = 9.56522, acc = 0.82638


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 308/600 ] loss = 5.55683, acc = 0.80912


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 309/600 ] loss = 9.57350, acc = 0.82828


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 309/600 ] loss = 5.67821, acc = 0.80618


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 310/600 ] loss = 9.54156, acc = 0.83258


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 310/600 ] loss = 5.54103, acc = 0.79377


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 311/600 ] loss = 9.65646, acc = 0.82778


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 311/600 ] loss = 5.60538, acc = 0.80979


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 312/600 ] loss = 9.63550, acc = 0.82688


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 312/600 ] loss = 5.61735, acc = 0.81431


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 313/600 ] loss = 9.58352, acc = 0.82738


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 313/600 ] loss = 5.57011, acc = 0.80370


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 314/600 ] loss = 9.74409, acc = 0.82948


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 314/600 ] loss = 5.31364, acc = 0.81182


100%|██████████| 79/79 [00:29<00:00,  2.63it/s]


[ Train | 315/600 ] loss = 9.77938, acc = 0.82688


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 315/600 ] loss = 5.65257, acc = 0.80212


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 316/600 ] loss = 9.79599, acc = 0.83148


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 316/600 ] loss = 5.69978, acc = 0.80144


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 317/600 ] loss = 9.81564, acc = 0.82698


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 317/600 ] loss = 5.61030, acc = 0.80054


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 318/600 ] loss = 9.85179, acc = 0.82938


100%|██████████| 35/35 [00:13<00:00,  2.51it/s]


[ Valid | 318/600 ] loss = 5.44536, acc = 0.80280


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 319/600 ] loss = 9.83623, acc = 0.83008


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 319/600 ] loss = 5.66066, acc = 0.80212


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 320/600 ] loss = 9.88620, acc = 0.83248


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 320/600 ] loss = 5.58562, acc = 0.80280


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 321/600 ] loss = 9.88387, acc = 0.82998


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 321/600 ] loss = 5.67263, acc = 0.79671


100%|██████████| 79/79 [00:31<00:00,  2.50it/s]


[ Train | 322/600 ] loss = 10.02261, acc = 0.82718


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 322/600 ] loss = 5.51548, acc = 0.80686


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 323/600 ] loss = 9.99883, acc = 0.82888


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 323/600 ] loss = 5.50623, acc = 0.80641


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 324/600 ] loss = 9.99436, acc = 0.83328


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 324/600 ] loss = 5.77160, acc = 0.80438


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 325/600 ] loss = 10.00405, acc = 0.83288


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 325/600 ] loss = 5.51664, acc = 0.80460


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 326/600 ] loss = 10.02848, acc = 0.83108


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 326/600 ] loss = 5.56646, acc = 0.80054


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 327/600 ] loss = 10.21612, acc = 0.82988


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 327/600 ] loss = 5.43545, acc = 0.80190


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 328/600 ] loss = 10.16599, acc = 0.82958


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 328/600 ] loss = 5.76207, acc = 0.79648


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 329/600 ] loss = 10.20044, acc = 0.82578


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 329/600 ] loss = 5.53063, acc = 0.80280


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 330/600 ] loss = 10.25394, acc = 0.83168


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 330/600 ] loss = 5.70139, acc = 0.80077


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 331/600 ] loss = 10.19545, acc = 0.83168


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 331/600 ] loss = 5.38192, acc = 0.81273


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 332/600 ] loss = 10.12838, acc = 0.83579


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 332/600 ] loss = 5.62451, acc = 0.80483


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 333/600 ] loss = 10.18535, acc = 0.83118


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 333/600 ] loss = 5.71204, acc = 0.79896


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 334/600 ] loss = 10.26673, acc = 0.83298


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 334/600 ] loss = 5.45282, acc = 0.81521


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 335/600 ] loss = 10.37877, acc = 0.83268


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 335/600 ] loss = 5.49634, acc = 0.80799


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 336/600 ] loss = 10.33977, acc = 0.83358


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 336/600 ] loss = 5.46757, acc = 0.80889


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 337/600 ] loss = 10.39089, acc = 0.83198


100%|██████████| 35/35 [00:14<00:00,  2.49it/s]


[ Valid | 337/600 ] loss = 5.52992, acc = 0.80370


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 338/600 ] loss = 10.39681, acc = 0.82968


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 338/600 ] loss = 5.64247, acc = 0.80280


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 339/600 ] loss = 10.54824, acc = 0.83518


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 339/600 ] loss = 5.56820, acc = 0.80596


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 340/600 ] loss = 10.43213, acc = 0.83829


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 340/600 ] loss = 5.45652, acc = 0.80912


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 341/600 ] loss = 10.50331, acc = 0.83428


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 341/600 ] loss = 5.52103, acc = 0.81769 -> best
Best model found at epoch 341, saving model


100%|██████████| 79/79 [00:31<00:00,  2.47it/s]


[ Train | 342/600 ] loss = 10.43375, acc = 0.83218


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 342/600 ] loss = 5.44043, acc = 0.80776


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 343/600 ] loss = 10.66613, acc = 0.83328


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 343/600 ] loss = 5.46195, acc = 0.80754


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 344/600 ] loss = 10.61589, acc = 0.83548


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 344/600 ] loss = 5.36664, acc = 0.80889


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 345/600 ] loss = 10.72925, acc = 0.84059


100%|██████████| 35/35 [00:13<00:00,  2.51it/s]


[ Valid | 345/600 ] loss = 5.45151, acc = 0.80708


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 346/600 ] loss = 10.59554, acc = 0.83999


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 346/600 ] loss = 5.39194, acc = 0.81453


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 347/600 ] loss = 10.64830, acc = 0.83609


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 347/600 ] loss = 5.43075, acc = 0.81227


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 348/600 ] loss = 10.72649, acc = 0.83829


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 348/600 ] loss = 5.36816, acc = 0.81092


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 349/600 ] loss = 10.69413, acc = 0.83518


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 349/600 ] loss = 5.44983, acc = 0.80799


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 350/600 ] loss = 10.75897, acc = 0.83108


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 350/600 ] loss = 5.48985, acc = 0.80551


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 351/600 ] loss = 10.72464, acc = 0.83839


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 351/600 ] loss = 5.40348, acc = 0.81115


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 352/600 ] loss = 10.74129, acc = 0.84129


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 352/600 ] loss = 5.40268, acc = 0.80460


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 353/600 ] loss = 10.93420, acc = 0.83729


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 353/600 ] loss = 5.65511, acc = 0.79648


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 354/600 ] loss = 10.85214, acc = 0.84049


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 354/600 ] loss = 5.57291, acc = 0.81273


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 355/600 ] loss = 10.88485, acc = 0.83789


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 355/600 ] loss = 5.43065, acc = 0.80190


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 356/600 ] loss = 10.83888, acc = 0.83859


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 356/600 ] loss = 5.53216, acc = 0.79625


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 357/600 ] loss = 10.96670, acc = 0.83819


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 357/600 ] loss = 5.49174, acc = 0.81227


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 358/600 ] loss = 10.89216, acc = 0.84029


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 358/600 ] loss = 5.42029, acc = 0.81521


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 359/600 ] loss = 10.99160, acc = 0.83438


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 359/600 ] loss = 5.53664, acc = 0.81521


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 360/600 ] loss = 10.89891, acc = 0.84039


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 360/600 ] loss = 5.45640, acc = 0.81995 -> best
Best model found at epoch 360, saving model


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 361/600 ] loss = 10.94084, acc = 0.83969


100%|██████████| 35/35 [00:14<00:00,  2.50it/s]


[ Valid | 361/600 ] loss = 5.53864, acc = 0.79738


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 362/600 ] loss = 11.12520, acc = 0.83859


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 362/600 ] loss = 5.47320, acc = 0.81160


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 363/600 ] loss = 11.02797, acc = 0.84009


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 363/600 ] loss = 5.58790, acc = 0.80325


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 364/600 ] loss = 11.00976, acc = 0.83839


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 364/600 ] loss = 5.46748, acc = 0.80979


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 365/600 ] loss = 11.04943, acc = 0.84569


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 365/600 ] loss = 5.41629, acc = 0.80663


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 366/600 ] loss = 11.10467, acc = 0.84529


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 366/600 ] loss = 5.34366, acc = 0.81498


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 367/600 ] loss = 11.22419, acc = 0.83589


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 367/600 ] loss = 5.42115, acc = 0.80934


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 368/600 ] loss = 11.20054, acc = 0.83969


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 368/600 ] loss = 5.44634, acc = 0.79671


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 369/600 ] loss = 11.23663, acc = 0.84339


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 369/600 ] loss = 5.32865, acc = 0.80708


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 370/600 ] loss = 11.22963, acc = 0.84239


100%|██████████| 35/35 [00:14<00:00,  2.46it/s]


[ Valid | 370/600 ] loss = 5.33663, acc = 0.81385


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 371/600 ] loss = 11.19198, acc = 0.83929


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 371/600 ] loss = 5.47874, acc = 0.81995


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 372/600 ] loss = 11.31437, acc = 0.84109


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 372/600 ] loss = 5.35200, acc = 0.82085 -> best
Best model found at epoch 372, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 373/600 ] loss = 11.28989, acc = 0.84699


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 373/600 ] loss = 5.37566, acc = 0.82153 -> best
Best model found at epoch 373, saving model


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 374/600 ] loss = 11.40747, acc = 0.84189


100%|██████████| 35/35 [00:13<00:00,  2.57it/s]


[ Valid | 374/600 ] loss = 5.38945, acc = 0.81340


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 375/600 ] loss = 11.32838, acc = 0.84429


100%|██████████| 35/35 [00:14<00:00,  2.47it/s]


[ Valid | 375/600 ] loss = 5.38506, acc = 0.81634


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 376/600 ] loss = 11.31174, acc = 0.84559


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 376/600 ] loss = 5.48467, acc = 0.81882


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 377/600 ] loss = 11.37422, acc = 0.84239


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 377/600 ] loss = 5.76855, acc = 0.79129


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 378/600 ] loss = 11.32890, acc = 0.84409


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 378/600 ] loss = 5.54587, acc = 0.81453


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 379/600 ] loss = 11.43057, acc = 0.84789


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 379/600 ] loss = 5.30583, acc = 0.81363


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 380/600 ] loss = 11.54371, acc = 0.84629


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 380/600 ] loss = 5.29034, acc = 0.80821


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 381/600 ] loss = 11.56448, acc = 0.84719


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 381/600 ] loss = 5.41281, acc = 0.81318


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 382/600 ] loss = 11.53470, acc = 0.84759


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 382/600 ] loss = 5.49404, acc = 0.81498


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 383/600 ] loss = 11.33400, acc = 0.84429


100%|██████████| 35/35 [00:13<00:00,  2.51it/s]


[ Valid | 383/600 ] loss = 5.41732, acc = 0.81498


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 384/600 ] loss = 11.47732, acc = 0.84549


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 384/600 ] loss = 5.41724, acc = 0.80957


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 385/600 ] loss = 11.49710, acc = 0.84559


100%|██████████| 35/35 [00:12<00:00,  2.82it/s]


[ Valid | 385/600 ] loss = 5.54561, acc = 0.80957


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 386/600 ] loss = 11.46503, acc = 0.84739


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 386/600 ] loss = 5.52575, acc = 0.81746


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 387/600 ] loss = 11.55963, acc = 0.85100


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 387/600 ] loss = 5.51403, acc = 0.81656


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 388/600 ] loss = 11.51983, acc = 0.84919


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 388/600 ] loss = 5.35155, acc = 0.81904


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 389/600 ] loss = 11.54086, acc = 0.85010


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 389/600 ] loss = 5.49897, acc = 0.81295


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 390/600 ] loss = 11.67344, acc = 0.84649


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 390/600 ] loss = 5.41944, acc = 0.81679


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 391/600 ] loss = 11.65530, acc = 0.85270


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 391/600 ] loss = 5.26969, acc = 0.81882


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 392/600 ] loss = 11.65580, acc = 0.84829


100%|██████████| 35/35 [00:14<00:00,  2.44it/s]


[ Valid | 392/600 ] loss = 5.33821, acc = 0.81340


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 393/600 ] loss = 11.70876, acc = 0.85110


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 393/600 ] loss = 5.39212, acc = 0.80866


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 394/600 ] loss = 11.55155, acc = 0.85560


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 394/600 ] loss = 5.26788, acc = 0.81927


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 395/600 ] loss = 11.74230, acc = 0.84829


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 395/600 ] loss = 5.28314, acc = 0.81634


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 396/600 ] loss = 11.70615, acc = 0.85050


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 396/600 ] loss = 5.41854, acc = 0.80979


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 397/600 ] loss = 11.56365, acc = 0.84789


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 397/600 ] loss = 5.39986, acc = 0.81250


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 398/600 ] loss = 11.66306, acc = 0.85030


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 398/600 ] loss = 5.40976, acc = 0.81002


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 399/600 ] loss = 11.71543, acc = 0.85340


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 399/600 ] loss = 5.52014, acc = 0.81611


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 400/600 ] loss = 11.67639, acc = 0.85200


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 400/600 ] loss = 5.44326, acc = 0.81431


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 401/600 ] loss = 11.73315, acc = 0.85240


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 401/600 ] loss = 5.40066, acc = 0.82062


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 402/600 ] loss = 11.73749, acc = 0.85190


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 402/600 ] loss = 5.35433, acc = 0.81792


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 403/600 ] loss = 11.81936, acc = 0.85070


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 403/600 ] loss = 5.44538, acc = 0.80663


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 404/600 ] loss = 11.80165, acc = 0.85020


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 404/600 ] loss = 5.28997, acc = 0.82107


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 405/600 ] loss = 11.83539, acc = 0.85070


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 405/600 ] loss = 5.27806, acc = 0.81656


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 406/600 ] loss = 11.76788, acc = 0.85410


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 406/600 ] loss = 5.34875, acc = 0.81543


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 407/600 ] loss = 11.95631, acc = 0.85250


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 407/600 ] loss = 5.28417, acc = 0.82153


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 408/600 ] loss = 11.77425, acc = 0.85710


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 408/600 ] loss = 5.33275, acc = 0.81295


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 409/600 ] loss = 11.85645, acc = 0.85500


100%|██████████| 35/35 [00:13<00:00,  2.66it/s]


[ Valid | 409/600 ] loss = 5.33648, acc = 0.81859


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 410/600 ] loss = 11.93652, acc = 0.85800


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 410/600 ] loss = 5.51542, acc = 0.81634


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 411/600 ] loss = 11.92722, acc = 0.85450


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 411/600 ] loss = 5.25570, acc = 0.81882


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 412/600 ] loss = 11.85973, acc = 0.85560


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 412/600 ] loss = 5.53678, acc = 0.82310 -> best
Best model found at epoch 412, saving model


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 413/600 ] loss = 11.76263, acc = 0.85950


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 413/600 ] loss = 5.41317, acc = 0.81792


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 414/600 ] loss = 11.83545, acc = 0.85720


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 414/600 ] loss = 5.36282, acc = 0.82514 -> best
Best model found at epoch 414, saving model


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 415/600 ] loss = 11.88931, acc = 0.85720


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 415/600 ] loss = 5.36685, acc = 0.80776


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 416/600 ] loss = 11.99486, acc = 0.85820


100%|██████████| 35/35 [00:13<00:00,  2.56it/s]


[ Valid | 416/600 ] loss = 5.57371, acc = 0.81408


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 417/600 ] loss = 11.94105, acc = 0.85750


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 417/600 ] loss = 5.25953, acc = 0.82220


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 418/600 ] loss = 11.96343, acc = 0.85490


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 418/600 ] loss = 5.26070, acc = 0.82288


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 419/600 ] loss = 11.97745, acc = 0.85750


100%|██████████| 35/35 [00:12<00:00,  2.81it/s]


[ Valid | 419/600 ] loss = 5.34199, acc = 0.81679


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 420/600 ] loss = 11.88568, acc = 0.86000


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 420/600 ] loss = 5.35210, acc = 0.81656


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 421/600 ] loss = 11.95736, acc = 0.85650


100%|██████████| 35/35 [00:14<00:00,  2.42it/s]


[ Valid | 421/600 ] loss = 5.27262, acc = 0.82536 -> best
Best model found at epoch 421, saving model


100%|██████████| 79/79 [00:32<00:00,  2.47it/s]


[ Train | 422/600 ] loss = 11.95412, acc = 0.85970


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 422/600 ] loss = 5.28697, acc = 0.82017


100%|██████████| 79/79 [00:32<00:00,  2.46it/s]


[ Train | 423/600 ] loss = 11.97418, acc = 0.85600


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 423/600 ] loss = 5.26573, acc = 0.81588


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 424/600 ] loss = 11.90480, acc = 0.85600


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 424/600 ] loss = 5.28998, acc = 0.82356


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 425/600 ] loss = 12.00343, acc = 0.86080


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 425/600 ] loss = 5.35874, acc = 0.82198


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 426/600 ] loss = 12.02959, acc = 0.85870


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 426/600 ] loss = 5.42059, acc = 0.81814


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 427/600 ] loss = 11.97725, acc = 0.85790


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 427/600 ] loss = 5.42887, acc = 0.81227


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 428/600 ] loss = 12.03799, acc = 0.86431


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 428/600 ] loss = 5.35511, acc = 0.81769


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 429/600 ] loss = 12.02523, acc = 0.86010


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 429/600 ] loss = 5.29773, acc = 0.81453


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 430/600 ] loss = 11.94221, acc = 0.85880


100%|██████████| 35/35 [00:14<00:00,  2.47it/s]


[ Valid | 430/600 ] loss = 5.30325, acc = 0.82220


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 431/600 ] loss = 11.90887, acc = 0.86641


100%|██████████| 35/35 [00:14<00:00,  2.46it/s]


[ Valid | 431/600 ] loss = 5.19016, acc = 0.82671 -> best
Best model found at epoch 431, saving model


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 432/600 ] loss = 11.99521, acc = 0.86270


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 432/600 ] loss = 5.48187, acc = 0.81679


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 433/600 ] loss = 12.13646, acc = 0.85840


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 433/600 ] loss = 5.34997, acc = 0.81566


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 434/600 ] loss = 12.02727, acc = 0.86180


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 434/600 ] loss = 5.42762, acc = 0.82175


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 435/600 ] loss = 12.15800, acc = 0.85790


100%|██████████| 35/35 [00:13<00:00,  2.52it/s]


[ Valid | 435/600 ] loss = 5.35853, acc = 0.82085


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 436/600 ] loss = 12.04735, acc = 0.85940


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 436/600 ] loss = 5.42932, acc = 0.80686


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 437/600 ] loss = 11.98318, acc = 0.86090


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 437/600 ] loss = 5.38278, acc = 0.81949


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 438/600 ] loss = 12.17293, acc = 0.86611


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 438/600 ] loss = 5.53910, acc = 0.81995


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 439/600 ] loss = 11.97906, acc = 0.86711


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 439/600 ] loss = 5.41070, acc = 0.82897 -> best
Best model found at epoch 439, saving model


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 440/600 ] loss = 11.98424, acc = 0.86551


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 440/600 ] loss = 5.44877, acc = 0.82288


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 441/600 ] loss = 12.03944, acc = 0.86621


100%|██████████| 35/35 [00:14<00:00,  2.45it/s]


[ Valid | 441/600 ] loss = 5.36888, acc = 0.80799


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 442/600 ] loss = 12.07098, acc = 0.86410


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 442/600 ] loss = 5.32456, acc = 0.82536


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 443/600 ] loss = 12.07560, acc = 0.86360


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 443/600 ] loss = 5.39920, acc = 0.82153


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 444/600 ] loss = 12.05593, acc = 0.86110


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 444/600 ] loss = 5.27785, acc = 0.82987 -> best
Best model found at epoch 444, saving model


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 445/600 ] loss = 12.14082, acc = 0.86631


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 445/600 ] loss = 5.25690, acc = 0.82807


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 446/600 ] loss = 12.09955, acc = 0.86481


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 446/600 ] loss = 5.25412, acc = 0.83055 -> best
Best model found at epoch 446, saving model


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 447/600 ] loss = 12.00171, acc = 0.86801


100%|██████████| 35/35 [00:13<00:00,  2.50it/s]


[ Valid | 447/600 ] loss = 5.32037, acc = 0.82807


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 448/600 ] loss = 11.98712, acc = 0.86831


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 448/600 ] loss = 5.37187, acc = 0.82671


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 449/600 ] loss = 12.02124, acc = 0.86871


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 449/600 ] loss = 5.20737, acc = 0.82491


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 450/600 ] loss = 12.12550, acc = 0.86170


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 450/600 ] loss = 5.40694, acc = 0.81634


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 451/600 ] loss = 11.97954, acc = 0.86721


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 451/600 ] loss = 5.20924, acc = 0.82581


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 452/600 ] loss = 11.94042, acc = 0.86701


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 452/600 ] loss = 5.35404, acc = 0.82040


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 453/600 ] loss = 12.04405, acc = 0.86851


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 453/600 ] loss = 5.46318, acc = 0.81047


100%|██████████| 79/79 [00:31<00:00,  2.55it/s]


[ Train | 454/600 ] loss = 12.05407, acc = 0.86901


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 454/600 ] loss = 5.28945, acc = 0.82243


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 455/600 ] loss = 11.96737, acc = 0.86861


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 455/600 ] loss = 5.34438, acc = 0.81746


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 456/600 ] loss = 12.00848, acc = 0.86601


100%|██████████| 35/35 [00:14<00:00,  2.50it/s]


[ Valid | 456/600 ] loss = 5.23951, acc = 0.82514


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 457/600 ] loss = 11.88204, acc = 0.87311


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 457/600 ] loss = 5.30560, acc = 0.83213 -> best
Best model found at epoch 457, saving model


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 458/600 ] loss = 12.08257, acc = 0.86711


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 458/600 ] loss = 5.33502, acc = 0.82491


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 459/600 ] loss = 11.98399, acc = 0.86931


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 459/600 ] loss = 5.32546, acc = 0.81566


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 460/600 ] loss = 11.92410, acc = 0.87191


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 460/600 ] loss = 5.24115, acc = 0.82356


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 461/600 ] loss = 11.80315, acc = 0.87511


100%|██████████| 35/35 [00:14<00:00,  2.43it/s]


[ Valid | 461/600 ] loss = 5.37649, acc = 0.82062


100%|██████████| 79/79 [00:32<00:00,  2.45it/s]


[ Train | 462/600 ] loss = 11.98121, acc = 0.86941


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 462/600 ] loss = 5.29189, acc = 0.82265


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 463/600 ] loss = 11.91542, acc = 0.87221


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 463/600 ] loss = 5.38349, acc = 0.81746


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 464/600 ] loss = 11.89305, acc = 0.87581


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 464/600 ] loss = 5.23972, acc = 0.81882


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 465/600 ] loss = 11.88370, acc = 0.86941


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 465/600 ] loss = 5.28975, acc = 0.81927


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 466/600 ] loss = 11.77042, acc = 0.87321


100%|██████████| 35/35 [00:14<00:00,  2.43it/s]


[ Valid | 466/600 ] loss = 5.36168, acc = 0.82536


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 467/600 ] loss = 11.83353, acc = 0.87231


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 467/600 ] loss = 5.35588, acc = 0.82852


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 468/600 ] loss = 11.76419, acc = 0.87511


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 468/600 ] loss = 5.25425, acc = 0.82987


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 469/600 ] loss = 11.73756, acc = 0.87591


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 469/600 ] loss = 5.45156, acc = 0.81476


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 470/600 ] loss = 11.91882, acc = 0.87501


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 470/600 ] loss = 5.47161, acc = 0.81408


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 471/600 ] loss = 11.79591, acc = 0.88072


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 471/600 ] loss = 5.22259, acc = 0.81611


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 472/600 ] loss = 11.84105, acc = 0.87571


100%|██████████| 35/35 [00:14<00:00,  2.45it/s]


[ Valid | 472/600 ] loss = 5.40626, acc = 0.82378


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 473/600 ] loss = 11.79987, acc = 0.87701


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 473/600 ] loss = 5.32309, acc = 0.82356


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 474/600 ] loss = 11.80247, acc = 0.87942


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 474/600 ] loss = 5.23841, acc = 0.82423


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 475/600 ] loss = 11.87813, acc = 0.87301


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 475/600 ] loss = 5.29889, acc = 0.81995


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 476/600 ] loss = 11.69752, acc = 0.88352


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 476/600 ] loss = 5.35035, acc = 0.81814


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 477/600 ] loss = 11.71320, acc = 0.88172


100%|██████████| 35/35 [00:14<00:00,  2.43it/s]


[ Valid | 477/600 ] loss = 5.35724, acc = 0.82536


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 478/600 ] loss = 11.61157, acc = 0.87711


100%|██████████| 35/35 [00:13<00:00,  2.55it/s]


[ Valid | 478/600 ] loss = 5.29482, acc = 0.83123


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 479/600 ] loss = 11.63842, acc = 0.87851


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 479/600 ] loss = 5.28671, acc = 0.82649


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 480/600 ] loss = 11.53659, acc = 0.88182


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 480/600 ] loss = 5.30664, acc = 0.82897


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 481/600 ] loss = 11.67851, acc = 0.88002


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 481/600 ] loss = 5.49038, acc = 0.82040


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 482/600 ] loss = 11.57983, acc = 0.88262


100%|██████████| 35/35 [00:13<00:00,  2.51it/s]


[ Valid | 482/600 ] loss = 5.27881, acc = 0.82717


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 483/600 ] loss = 11.48702, acc = 0.88452


100%|██████████| 35/35 [00:14<00:00,  2.42it/s]


[ Valid | 483/600 ] loss = 5.40401, acc = 0.82085


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 484/600 ] loss = 11.69248, acc = 0.87912


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 484/600 ] loss = 5.41097, acc = 0.80551


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 485/600 ] loss = 11.52340, acc = 0.88212


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 485/600 ] loss = 5.21962, acc = 0.82965


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 486/600 ] loss = 11.46703, acc = 0.88142


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 486/600 ] loss = 5.30656, acc = 0.82446


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 487/600 ] loss = 11.45325, acc = 0.88772


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 487/600 ] loss = 5.29842, acc = 0.82626


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 488/600 ] loss = 11.45825, acc = 0.88242


100%|██████████| 35/35 [00:14<00:00,  2.40it/s]


[ Valid | 488/600 ] loss = 5.40764, acc = 0.82920


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 489/600 ] loss = 11.45731, acc = 0.88142


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 489/600 ] loss = 5.41162, acc = 0.82040


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 490/600 ] loss = 11.44613, acc = 0.88382


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 490/600 ] loss = 5.29010, acc = 0.82333


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 491/600 ] loss = 11.48917, acc = 0.88192


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 491/600 ] loss = 5.17178, acc = 0.83461 -> best
Best model found at epoch 491, saving model


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 492/600 ] loss = 11.36682, acc = 0.88842


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 492/600 ] loss = 5.37164, acc = 0.82468


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 493/600 ] loss = 11.32050, acc = 0.88832


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 493/600 ] loss = 5.37808, acc = 0.82920


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 494/600 ] loss = 11.40408, acc = 0.88592


100%|██████████| 35/35 [00:14<00:00,  2.44it/s]


[ Valid | 494/600 ] loss = 5.41076, acc = 0.82356


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 495/600 ] loss = 11.19467, acc = 0.88992


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 495/600 ] loss = 5.49526, acc = 0.82310


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 496/600 ] loss = 11.38335, acc = 0.88502


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 496/600 ] loss = 5.37938, acc = 0.82649


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 497/600 ] loss = 11.23371, acc = 0.88932


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 497/600 ] loss = 5.36012, acc = 0.82153


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 498/600 ] loss = 11.30690, acc = 0.88562


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 498/600 ] loss = 5.37295, acc = 0.81746


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 499/600 ] loss = 11.16260, acc = 0.88692


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 499/600 ] loss = 5.45536, acc = 0.82333


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 500/600 ] loss = 11.17362, acc = 0.88522


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 500/600 ] loss = 5.28678, acc = 0.82017


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 501/600 ] loss = 11.06990, acc = 0.88592


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 501/600 ] loss = 5.48677, acc = 0.81792


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 502/600 ] loss = 11.06806, acc = 0.88362


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 502/600 ] loss = 5.40009, acc = 0.82491


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 503/600 ] loss = 10.99157, acc = 0.88672


100%|██████████| 35/35 [00:13<00:00,  2.53it/s]


[ Valid | 503/600 ] loss = 5.22706, acc = 0.82829


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 504/600 ] loss = 11.02374, acc = 0.89062


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 504/600 ] loss = 5.35163, acc = 0.82897


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 505/600 ] loss = 11.02592, acc = 0.89242


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 505/600 ] loss = 5.39868, acc = 0.82468


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 506/600 ] loss = 10.95078, acc = 0.89403


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 506/600 ] loss = 5.20933, acc = 0.82446


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 507/600 ] loss = 10.97075, acc = 0.89272


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 507/600 ] loss = 5.39059, acc = 0.83078


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 508/600 ] loss = 10.82429, acc = 0.89473


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 508/600 ] loss = 5.49222, acc = 0.82829


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 509/600 ] loss = 10.90565, acc = 0.88942


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 509/600 ] loss = 5.46669, acc = 0.81837


100%|██████████| 79/79 [00:31<00:00,  2.54it/s]


[ Train | 510/600 ] loss = 10.81105, acc = 0.89142


100%|██████████| 35/35 [00:12<00:00,  2.81it/s]


[ Valid | 510/600 ] loss = 5.36101, acc = 0.82671


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 511/600 ] loss = 10.64888, acc = 0.89323


100%|██████████| 35/35 [00:13<00:00,  2.63it/s]


[ Valid | 511/600 ] loss = 5.59493, acc = 0.81656


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 512/600 ] loss = 10.79641, acc = 0.89413


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 512/600 ] loss = 5.26114, acc = 0.82739


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 513/600 ] loss = 10.67119, acc = 0.89373


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 513/600 ] loss = 5.47045, acc = 0.82220


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 514/600 ] loss = 10.73420, acc = 0.89443


100%|██████████| 35/35 [00:14<00:00,  2.44it/s]


[ Valid | 514/600 ] loss = 5.39901, acc = 0.82559


100%|██████████| 79/79 [00:29<00:00,  2.72it/s]


[ Train | 515/600 ] loss = 10.54104, acc = 0.89843


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 515/600 ] loss = 5.55777, acc = 0.82265


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 516/600 ] loss = 10.67537, acc = 0.89573


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 516/600 ] loss = 5.40995, acc = 0.83010


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 517/600 ] loss = 10.70838, acc = 0.89413


100%|██████████| 35/35 [00:13<00:00,  2.58it/s]


[ Valid | 517/600 ] loss = 5.40378, acc = 0.83078


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 518/600 ] loss = 10.49437, acc = 0.89603


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 518/600 ] loss = 5.44021, acc = 0.82897


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 519/600 ] loss = 10.60979, acc = 0.89212


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 519/600 ] loss = 5.47799, acc = 0.82920


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 520/600 ] loss = 10.52240, acc = 0.89363


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 520/600 ] loss = 5.38415, acc = 0.83439


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 521/600 ] loss = 10.45524, acc = 0.89453


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 521/600 ] loss = 5.29801, acc = 0.83461


100%|██████████| 79/79 [00:30<00:00,  2.61it/s]


[ Train | 522/600 ] loss = 10.41629, acc = 0.89753


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 522/600 ] loss = 5.44381, acc = 0.82062


100%|██████████| 79/79 [00:29<00:00,  2.72it/s]


[ Train | 523/600 ] loss = 10.19320, acc = 0.90203


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 523/600 ] loss = 5.45876, acc = 0.83100


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 524/600 ] loss = 10.27193, acc = 0.89763


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 524/600 ] loss = 5.43545, acc = 0.81701


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 525/600 ] loss = 10.32058, acc = 0.89353


100%|██████████| 35/35 [00:14<00:00,  2.49it/s]


[ Valid | 525/600 ] loss = 5.50772, acc = 0.81792


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 526/600 ] loss = 10.12890, acc = 0.89923


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 526/600 ] loss = 5.60424, acc = 0.81904


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 527/600 ] loss = 10.05307, acc = 0.90163


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 527/600 ] loss = 5.56781, acc = 0.82491


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 528/600 ] loss = 10.03118, acc = 0.90083


100%|██████████| 35/35 [00:13<00:00,  2.59it/s]


[ Valid | 528/600 ] loss = 5.51194, acc = 0.82423


100%|██████████| 79/79 [00:29<00:00,  2.65it/s]


[ Train | 529/600 ] loss = 9.99814, acc = 0.90013


100%|██████████| 35/35 [00:13<00:00,  2.68it/s]


[ Valid | 529/600 ] loss = 5.53421, acc = 0.81453


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 530/600 ] loss = 9.84179, acc = 0.90153


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 530/600 ] loss = 5.45830, acc = 0.81927


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 531/600 ] loss = 9.91543, acc = 0.90073


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 531/600 ] loss = 5.39107, acc = 0.82897


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 532/600 ] loss = 9.85578, acc = 0.90283


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 532/600 ] loss = 5.40401, acc = 0.82491


100%|██████████| 79/79 [00:30<00:00,  2.63it/s]


[ Train | 533/600 ] loss = 9.86831, acc = 0.90353


100%|██████████| 35/35 [00:13<00:00,  2.60it/s]


[ Valid | 533/600 ] loss = 5.57106, acc = 0.82356


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 534/600 ] loss = 9.69987, acc = 0.90693


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 534/600 ] loss = 5.73456, acc = 0.81476


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 535/600 ] loss = 9.67377, acc = 0.90453


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 535/600 ] loss = 5.55353, acc = 0.82175


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 536/600 ] loss = 9.61630, acc = 0.90353


100%|██████████| 35/35 [00:13<00:00,  2.50it/s]


[ Valid | 536/600 ] loss = 5.59334, acc = 0.81859


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 537/600 ] loss = 9.43137, acc = 0.90503


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 537/600 ] loss = 5.44752, acc = 0.83619 -> best
Best model found at epoch 537, saving model


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 538/600 ] loss = 9.61716, acc = 0.90363


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 538/600 ] loss = 5.47796, acc = 0.82401


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 539/600 ] loss = 9.45466, acc = 0.90683


100%|██████████| 35/35 [00:13<00:00,  2.61it/s]


[ Valid | 539/600 ] loss = 5.34823, acc = 0.83100


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 540/600 ] loss = 9.48133, acc = 0.90283


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 540/600 ] loss = 5.45394, acc = 0.83123


100%|██████████| 79/79 [00:30<00:00,  2.59it/s]


[ Train | 541/600 ] loss = 9.28536, acc = 0.90563


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 541/600 ] loss = 5.76943, acc = 0.82085


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 542/600 ] loss = 9.17677, acc = 0.90623


100%|██████████| 35/35 [00:12<00:00,  2.82it/s]


[ Valid | 542/600 ] loss = 5.46617, acc = 0.82536


100%|██████████| 79/79 [00:31<00:00,  2.51it/s]


[ Train | 543/600 ] loss = 9.06473, acc = 0.90914


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 543/600 ] loss = 5.49794, acc = 0.82852


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 544/600 ] loss = 9.19853, acc = 0.90844


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]


[ Valid | 544/600 ] loss = 5.50026, acc = 0.83574


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 545/600 ] loss = 9.06797, acc = 0.91044


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 545/600 ] loss = 5.42058, acc = 0.82626


100%|██████████| 79/79 [00:31<00:00,  2.52it/s]


[ Train | 546/600 ] loss = 8.99941, acc = 0.90974


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 546/600 ] loss = 5.43882, acc = 0.82423


100%|██████████| 79/79 [00:28<00:00,  2.74it/s]


[ Train | 547/600 ] loss = 8.93198, acc = 0.90974


100%|██████████| 35/35 [00:14<00:00,  2.44it/s]


[ Valid | 547/600 ] loss = 5.48303, acc = 0.80799


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 548/600 ] loss = 8.90276, acc = 0.90854


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 548/600 ] loss = 5.62185, acc = 0.82829


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 549/600 ] loss = 8.81748, acc = 0.91274


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 549/600 ] loss = 5.91974, acc = 0.81205


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 550/600 ] loss = 8.73957, acc = 0.91274


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 550/600 ] loss = 5.59875, acc = 0.82807


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 551/600 ] loss = 8.61495, acc = 0.91364


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 551/600 ] loss = 5.55954, acc = 0.82536


100%|██████████| 79/79 [00:29<00:00,  2.64it/s]


[ Train | 552/600 ] loss = 8.55933, acc = 0.91454


100%|██████████| 35/35 [00:13<00:00,  2.62it/s]


[ Valid | 552/600 ] loss = 5.52641, acc = 0.82062


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 553/600 ] loss = 8.59240, acc = 0.91184


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 553/600 ] loss = 5.68812, acc = 0.82626


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 554/600 ] loss = 8.46496, acc = 0.91024


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 554/600 ] loss = 5.55932, acc = 0.82671


100%|██████████| 79/79 [00:28<00:00,  2.74it/s]


[ Train | 555/600 ] loss = 8.40065, acc = 0.91574


100%|██████████| 35/35 [00:14<00:00,  2.40it/s]


[ Valid | 555/600 ] loss = 5.64277, acc = 0.82198


100%|██████████| 79/79 [00:30<00:00,  2.62it/s]


[ Train | 556/600 ] loss = 8.32300, acc = 0.91704


100%|██████████| 35/35 [00:12<00:00,  2.82it/s]


[ Valid | 556/600 ] loss = 5.54308, acc = 0.82762


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 557/600 ] loss = 8.25520, acc = 0.91534


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 557/600 ] loss = 5.72973, acc = 0.80641


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 558/600 ] loss = 8.13207, acc = 0.91344


100%|██████████| 35/35 [00:13<00:00,  2.51it/s]


[ Valid | 558/600 ] loss = 5.50484, acc = 0.82717


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 559/600 ] loss = 8.05774, acc = 0.91424


100%|██████████| 35/35 [00:12<00:00,  2.81it/s]


[ Valid | 559/600 ] loss = 5.50347, acc = 0.82784


100%|██████████| 79/79 [00:31<00:00,  2.50it/s]


[ Train | 560/600 ] loss = 8.00690, acc = 0.91604


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 560/600 ] loss = 5.55032, acc = 0.82942


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 561/600 ] loss = 7.86593, acc = 0.91904


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 561/600 ] loss = 5.62062, acc = 0.82175


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 562/600 ] loss = 7.80975, acc = 0.91774


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 562/600 ] loss = 5.71071, acc = 0.81679


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 563/600 ] loss = 7.75871, acc = 0.91534


100%|██████████| 35/35 [00:13<00:00,  2.64it/s]


[ Valid | 563/600 ] loss = 5.59366, acc = 0.82581


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 564/600 ] loss = 7.69357, acc = 0.92104


100%|██████████| 35/35 [00:13<00:00,  2.67it/s]


[ Valid | 564/600 ] loss = 5.66181, acc = 0.82717


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 565/600 ] loss = 7.59677, acc = 0.91844


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 565/600 ] loss = 5.68276, acc = 0.83461


100%|██████████| 79/79 [00:28<00:00,  2.74it/s]


[ Train | 566/600 ] loss = 7.45408, acc = 0.92074


100%|██████████| 35/35 [00:14<00:00,  2.47it/s]


[ Valid | 566/600 ] loss = 5.62202, acc = 0.82987


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 567/600 ] loss = 7.38842, acc = 0.91954


100%|██████████| 35/35 [00:12<00:00,  2.69it/s]


[ Valid | 567/600 ] loss = 5.55033, acc = 0.82920


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 568/600 ] loss = 7.26341, acc = 0.92415


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 568/600 ] loss = 5.86212, acc = 0.81588


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 569/600 ] loss = 7.19645, acc = 0.92275


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 569/600 ] loss = 5.49380, acc = 0.82897


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 570/600 ] loss = 7.16345, acc = 0.92255


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 570/600 ] loss = 5.60213, acc = 0.82446


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 571/600 ] loss = 7.03979, acc = 0.91984


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 571/600 ] loss = 5.69426, acc = 0.83168


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 572/600 ] loss = 6.89367, acc = 0.92615


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 572/600 ] loss = 5.76171, acc = 0.82536


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 573/600 ] loss = 6.91360, acc = 0.92074


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 573/600 ] loss = 5.79399, acc = 0.80844


100%|██████████| 79/79 [00:29<00:00,  2.66it/s]


[ Train | 574/600 ] loss = 6.71304, acc = 0.92685


100%|██████████| 35/35 [00:14<00:00,  2.48it/s]


[ Valid | 574/600 ] loss = 5.66666, acc = 0.82920


100%|██████████| 79/79 [00:29<00:00,  2.69it/s]


[ Train | 575/600 ] loss = 6.63265, acc = 0.92515


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 575/600 ] loss = 5.63570, acc = 0.82378


100%|██████████| 79/79 [00:30<00:00,  2.58it/s]


[ Train | 576/600 ] loss = 6.56954, acc = 0.92415


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


[ Valid | 576/600 ] loss = 5.74824, acc = 0.82559


100%|██████████| 79/79 [00:29<00:00,  2.72it/s]


[ Train | 577/600 ] loss = 6.44662, acc = 0.92555


100%|██████████| 35/35 [00:13<00:00,  2.65it/s]


[ Valid | 577/600 ] loss = 5.64998, acc = 0.82987


100%|██████████| 79/79 [00:29<00:00,  2.68it/s]


[ Train | 578/600 ] loss = 6.25139, acc = 0.92965


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 578/600 ] loss = 5.95621, acc = 0.81972


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 579/600 ] loss = 6.25678, acc = 0.92835


100%|██████████| 35/35 [00:12<00:00,  2.70it/s]


[ Valid | 579/600 ] loss = 5.88881, acc = 0.82265


100%|██████████| 79/79 [00:29<00:00,  2.70it/s]


[ Train | 580/600 ] loss = 6.02133, acc = 0.93025


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 580/600 ] loss = 5.80988, acc = 0.82198


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 581/600 ] loss = 5.95820, acc = 0.93175


100%|██████████| 35/35 [00:12<00:00,  2.78it/s]


[ Valid | 581/600 ] loss = 5.96208, acc = 0.80844


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 582/600 ] loss = 5.99405, acc = 0.92705


100%|██████████| 35/35 [00:14<00:00,  2.42it/s]


[ Valid | 582/600 ] loss = 5.84133, acc = 0.82942


100%|██████████| 79/79 [00:28<00:00,  2.74it/s]


[ Train | 583/600 ] loss = 5.78256, acc = 0.93265


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 583/600 ] loss = 5.66672, acc = 0.82175


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 584/600 ] loss = 5.68854, acc = 0.93105


100%|██████████| 35/35 [00:12<00:00,  2.80it/s]


[ Valid | 584/600 ] loss = 5.88066, acc = 0.82356


100%|██████████| 79/79 [00:28<00:00,  2.75it/s]


[ Train | 585/600 ] loss = 5.47307, acc = 0.93385


100%|██████████| 35/35 [00:14<00:00,  2.45it/s]


[ Valid | 585/600 ] loss = 5.86414, acc = 0.83190


100%|██████████| 79/79 [00:29<00:00,  2.72it/s]


[ Train | 586/600 ] loss = 5.46950, acc = 0.93195


100%|██████████| 35/35 [00:12<00:00,  2.73it/s]


[ Valid | 586/600 ] loss = 6.04603, acc = 0.82220


100%|██████████| 79/79 [00:30<00:00,  2.56it/s]


[ Train | 587/600 ] loss = 5.24764, acc = 0.93325


100%|██████████| 35/35 [00:12<00:00,  2.75it/s]


[ Valid | 587/600 ] loss = 6.01641, acc = 0.82175


100%|██████████| 79/79 [00:28<00:00,  2.75it/s]


[ Train | 588/600 ] loss = 5.17426, acc = 0.93756


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 588/600 ] loss = 5.84848, acc = 0.82062


100%|██████████| 79/79 [00:30<00:00,  2.57it/s]


[ Train | 589/600 ] loss = 5.10518, acc = 0.93295


100%|██████████| 35/35 [00:12<00:00,  2.77it/s]


[ Valid | 589/600 ] loss = 5.91623, acc = 0.81047


100%|██████████| 79/79 [00:29<00:00,  2.67it/s]


[ Train | 590/600 ] loss = 4.98414, acc = 0.93666


100%|██████████| 35/35 [00:14<00:00,  2.47it/s]


[ Valid | 590/600 ] loss = 6.17922, acc = 0.81656


100%|██████████| 79/79 [00:28<00:00,  2.77it/s]


[ Train | 591/600 ] loss = 4.88781, acc = 0.93325


100%|██████████| 35/35 [00:12<00:00,  2.74it/s]


[ Valid | 591/600 ] loss = 6.15091, acc = 0.81588


100%|██████████| 79/79 [00:31<00:00,  2.53it/s]


[ Train | 592/600 ] loss = 4.74135, acc = 0.93435


100%|██████████| 35/35 [00:12<00:00,  2.76it/s]


[ Valid | 592/600 ] loss = 6.14075, acc = 0.81746


100%|██████████| 79/79 [00:28<00:00,  2.74it/s]


[ Train | 593/600 ] loss = 4.58590, acc = 0.93796


100%|██████████| 35/35 [00:14<00:00,  2.43it/s]


[ Valid | 593/600 ] loss = 6.02517, acc = 0.82175


100%|██████████| 79/79 [00:28<00:00,  2.73it/s]


[ Train | 594/600 ] loss = 4.43352, acc = 0.93696


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 594/600 ] loss = 6.17408, acc = 0.81611


100%|██████████| 79/79 [00:30<00:00,  2.55it/s]


[ Train | 595/600 ] loss = 4.34000, acc = 0.93465


100%|██████████| 35/35 [00:12<00:00,  2.79it/s]


[ Valid | 595/600 ] loss = 6.35062, acc = 0.81927


100%|██████████| 79/79 [00:29<00:00,  2.72it/s]


[ Train | 596/600 ] loss = 4.15978, acc = 0.93726


100%|██████████| 35/35 [00:12<00:00,  2.72it/s]


[ Valid | 596/600 ] loss = 6.28613, acc = 0.81092


100%|██████████| 79/79 [00:30<00:00,  2.60it/s]


[ Train | 597/600 ] loss = 4.02867, acc = 0.93465


100%|██████████| 35/35 [00:12<00:00,  2.71it/s]


[ Valid | 597/600 ] loss = 6.38374, acc = 0.81363


100%|██████████| 79/79 [00:29<00:00,  2.71it/s]


[ Train | 598/600 ] loss = 3.89842, acc = 0.93686


100%|██████████| 35/35 [00:13<00:00,  2.54it/s]

[ Valid | 598/600 ] loss = 6.41302, acc = 0.81679
No improvment 60 consecutive epochs, early stopping
Finish training





### Inference
load the best model of the experiment and generate submission.csv

In [18]:
# create dataloader for evaluation
eval_set = FoodDataset(os.path.join(cfg['dataset_root'], "evaluation"), tfm=test_tfm)
eval_loader = DataLoader(eval_set, batch_size=cfg['batch_size'], shuffle=False, num_workers=4, pin_memory=True)

One /kaggle/input/ml2023spring-hw13/Food-11/evaluation sample /kaggle/input/ml2023spring-hw13/Food-11/evaluation/0000.jpg


In [19]:
# Load model from {exp_name}/student_best.ckpt
student_model_best = get_student_model() # get a new student model to avoid reference before assignment.
ckpt_path = f"{save_path}/student_best.ckpt" # the ckpt path of the best student model.
student_model_best.load_state_dict(torch.load(ckpt_path, map_location='cpu')) # load the state dict and set it to the student model
student_model_best.to(device) # set the student model to device

# Start evaluate
student_model_best.eval()
eval_preds = [] # storing predictions of the evaluation dataset

# Iterate the validation set by batches.
for batch in tqdm(eval_loader):
    # A batch consists of image data and corresponding labels.
    imgs, _ = batch
    # We don't need gradient in evaluation.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = student_model_best(imgs.to(device))
        preds = list(logits.argmax(dim=-1).squeeze().cpu().numpy())
    # loss and acc can not be calculated because we do not have the true labels of the evaluation set.
    eval_preds += preds

def pad4(i):
    return "0"*(4-len(str(i))) + str(i)

# Save prediction results
ids = [pad4(i) for i in range(0,len(eval_set))]
categories = eval_preds

df = pd.DataFrame()
df['Id'] = ids
df['Category'] = categories
df.to_csv(f"{save_path}/submission.csv", index=False) # now you can download the submission.csv and upload it to the kaggle competition.

100%|██████████| 18/18 [00:09<00:00,  1.93it/s]


> Don't forget to answer the report questions on GradeScope! 