# Homework 13 - Network Compression

Author: Chen-Wei Ke (b08501098@ntu.edu.tw), modified from ML2022-HW13 (Liang-Hsuan Tseng)

If you have any questions, feel free to ask: mlta-2023-spring@googlegroups.com

[**Link to HW13 Slides**](https://docs.google.com/presentation/d/1QAVMbnabmmMNvmugPlHMg_GVKaYrKa6hoTSFeJl9OCs/edit?usp=sharing)

## Outline

* [Packages](#Packages) - intall some required packages.
* [Dataset](#Dataset) - something you need to know about the dataset.
* [Configs](#Configs) - the configs of the experiments, you can change some hyperparameters here.
* [Architecture_Design](#Architecture_Design) - depthwise and pointwise convolution examples and some useful links.
* [Knowledge_Distillation](#Knowledge_Distillation) - KL divergence loss for knowledge distillation and some useful links.
* [Training](#Training) - training loop implementation modified from HW3.
* [Inference](#Inference) - create submission.csv by using the student_best.ckpt from the previous experiment.



### Packages
First, we need to import some useful packages. If the torchsummary package are not intalled, please install it via `pip install torchsummary`

In [1]:
# Import some useful packages for this homework
import numpy as np
import pandas as pd
import torch
import os
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image
from torch.utils.data import ConcatDataset, DataLoader, Subset, Dataset # "ConcatDataset" and "Subset" are possibly useful
from torchvision.datasets import DatasetFolder, VisionDataset
from torchsummary import summary
from tqdm.auto import tqdm
import random

# !nvidia-smi # list your current GPU

  from .autonotebook import tqdm as notebook_tqdm


### Configs
In this part, you can specify some variables and hyperparameters as your configs.

In [2]:
cfg = {
    'dataset_root': './Food-11',
    'save_dir': './outputs',
    'exp_name': "medium_baseline",
    'batch_size': 256,
    'lr': 3e-4,
    'seed': 20220013,
    'loss_fn_type': 'KD', # simple baseline: CE, medium baseline: KD. See the Knowledge_Distillation part for more information.
    'weight_decay': 1e-5,
    'grad_norm_max': 10,
    'n_epochs': 600, # train more steps to pass the medium baseline.
    'patience': 300,
}

In [3]:
myseed = cfg['seed']  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
random.seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

save_path = os.path.join(cfg['save_dir'], cfg['exp_name']) # create saving directory
os.makedirs(save_path, exist_ok=True)

# define simple logging functionality
log_fw = open(f"{save_path}/log.txt", 'w') # open log file to save log outputs
def log(text):     # define a logging function to trace the training process
    print(text)
    log_fw.write(str(text)+'\n')
    log_fw.flush()

log(cfg)  # log your configs to the log file

{'dataset_root': './Food-11', 'save_dir': './outputs', 'exp_name': 'medium_baseline', 'batch_size': 256, 'lr': 0.0003, 'seed': 20220013, 'loss_fn_type': 'KD', 'weight_decay': 1e-05, 'grad_norm_max': 10, 'n_epochs': 600, 'patience': 300}


### Dataset
We use Food11 dataset for this homework, which is similar to homework3. But remember, Please DO NOT utilize the dataset of HW3. We've modified the dataset, so you should only access the dataset by loading it in this kaggle notebook or through the links provided in the HW13 colab notebooks.

In [4]:
# fetch and download the dataset from github (about 1.12G)
#import gdown
#!wget -O Food-11.tar.gz https://www.dropbox.com/s/v97fi9xrwp9b964/food11-hw13.tar.gz?dl=0
#gdown.download('https://drive.google.com/uc?id=1fTMLOeQ0-131Cq6ZLUndiwlTwE2CinYP', 'Food-11.tar.gz')

In [5]:
# extract the data
#!tar -xzf ./Food-11.tar.gz # Could take some time
# !tar -xzvf ./Food-11.tar.gz # use this command if you want to checkout the whole process.

In [6]:
for dirname, _, filenames in os.walk('./Food-11'):
    if len(filenames) > 0:
        print(f"{dirname}: {len(filenames)} files.") # Show the file amounts in each split.

./Food-11: 1 files.
./Food-11/training: 9993 files.
./Food-11/evaluation: 2218 files.
./Food-11/validation: 4432 files.


Next, specify train/test transform for image data augmentation.
Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms. You can also apply the knowledge or experience you learned in HW3.

In [7]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# define training/testing transforms
test_tfm = transforms.Compose([
    # It is not encouraged to modify this part if you are using the provided teacher model. This transform is stardard and good enough for testing.
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
])

train_tfm = transforms.Compose([
    # add some useful transform or augmentation here, according to your experience in HW3.
    transforms.Resize(256),  # You can change this
    transforms.CenterCrop(224), # You can change this, but be aware of that the given teacher model's input size is 224.
    # The training input size of the provided teacher model is (3, 224, 224).
    # Thus, Input size other then 224 might hurt the performance. please be careful.
    
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),

    transforms.ToTensor(),
    normalize,
])

In [8]:
class FoodDataset(Dataset):
    def __init__(self, path, tfm=test_tfm, files = None):
        super().__init__()
        self.path = path
        self.files = sorted([os.path.join(path,x) for x in os.listdir(path) if x.endswith(".jpg")])
        if files != None:
            self.files = files
        print(f"One {path} sample",self.files[0])
        self.transform = tfm

    def __len__(self):
        return len(self.files)

    def __getitem__(self,idx):
        fname = self.files[idx]
        im = Image.open(fname)
        im = self.transform(im)
        try:
            label = int(fname.split("/")[-1].split("_")[0])
        except:
            label = -1 # test has no label
        return im,label

In [9]:
# Form train/valid dataloaders
train_set = FoodDataset(os.path.join(cfg['dataset_root'],"training"), tfm=train_tfm)
train_loader = DataLoader(train_set, batch_size=cfg['batch_size'], shuffle=True, num_workers=0, pin_memory=True)

valid_set = FoodDataset(os.path.join(cfg['dataset_root'], "validation"), tfm=test_tfm)
valid_loader = DataLoader(valid_set, batch_size=cfg['batch_size'], shuffle=False, num_workers=0, pin_memory=True)

One ./Food-11/training sample ./Food-11/training/0_0.jpg
One ./Food-11/validation sample ./Food-11/validation/0_0.jpg


### Architecture_Design

In this homework, you have to design a smaller network and make it perform well. Apparently, a well-designed architecture is crucial for such task. Here, we introduce the depthwise and pointwise convolution. These variants of convolution are some common techniques for architecture design when it comes to network compression.

<img src="https://i.imgur.com/LFDKHOp.png" width=400px>

* explanation of depthwise and pointwise convolutions:
    * [prof. Hung-yi Lee's slides(p.24~p.30, especially p.28)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/tiny_v7.pdf)

* other useful techniques
    * [group convolution](https://www.researchgate.net/figure/The-transformations-within-a-layer-in-DenseNets-left-and-CondenseNets-at-training-time_fig2_321325862) (Actually, depthwise convolution is a specific type of group convolution)
    * [SqueezeNet](!https://arxiv.org/abs/1602.07360)
    * [MobileNet](!https://arxiv.org/abs/1704.04861)
    * [ShuffleNet](!https://arxiv.org/abs/1707.01083)
    * [Xception](!https://arxiv.org/abs/1610.02357)
    * [GhostNet](!https://arxiv.org/abs/1911.11907)


After introducing depthwise and pointwise convolutions, let's define the **student network architecture**. Here, we have a very simple network formed by some regular convolution layers and pooling layers. You can replace the regular convolution layers with the depthwise and pointwise convolutions. In this way, you can further increase the depth or the width of your network architecture.

In [10]:
# Define your student network here. You have to copy-paste this code block to HW13 GradeScope before deadline.
# We will use your student network definition to evaluate your results(including the total parameter amount).

# Example implementation of Depthwise and Pointwise Convolution
def dwpw_conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
    return nn.Sequential(
        nn.Conv2d(in_channels, in_channels, kernel_size, stride=stride, padding=padding, groups=in_channels), #depthwise convolution
        nn.Conv2d(in_channels, out_channels, 1), # pointwise convolution
    )

class StudentNet(nn.Module):
    def __init__(self):
      super().__init__()

      # ---------- TODO ----------
      # Modify your model architecture

      self.cnn = nn.Sequential(
        nn.Conv2d(3, 4, 3),
        nn.BatchNorm2d(4),
        nn.ReLU(),

        nn.Conv2d(4, 16, 3),
        nn.BatchNorm2d(16),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),

        nn.Conv2d(16, 64, 3),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),

        nn.Conv2d(64, 84, 3),
        nn.BatchNorm2d(84),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),

        # Here we adopt Global Average Pooling for various input size.
        nn.AdaptiveAvgPool2d((1, 1)),
      )
      self.fc = nn.Sequential(
        nn.Linear(84, 11),
      )

    def forward(self, x):
      out = self.cnn(x)
      out = out.view(out.size()[0], -1)
      return self.fc(out)

def get_student_model(): # This function should have no arguments so that we can get your student network by directly calling it.
    # you can modify or do anything here, just remember to return an nn.Module as your student network.
    return StudentNet()

# End of definition of your student model and the get_student_model API
# Please copy-paste the whole code block, including the get_student_model function.

After specifying the student network architecture, please use `torchsummary` package to get information about the network and verify the total number of parameters. Note that the total params of your student network should not exceed the limit (`Total params` in `torchsummary` ≤ 60,000).

In [11]:
# DO NOT modify this block and please make sure that this block can run sucessfully.
student_model = get_student_model()
summary(student_model, (3, 224, 224), device='cpu')
# You have to copy&paste the results of this block to HW13 GradeScope.

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 4, 222, 222]             112
       BatchNorm2d-2          [-1, 4, 222, 222]               8
              ReLU-3          [-1, 4, 222, 222]               0
            Conv2d-4         [-1, 16, 220, 220]             592
       BatchNorm2d-5         [-1, 16, 220, 220]              32
              ReLU-6         [-1, 16, 220, 220]               0
         MaxPool2d-7         [-1, 16, 110, 110]               0
            Conv2d-8         [-1, 64, 108, 108]           9,280
       BatchNorm2d-9         [-1, 64, 108, 108]             128
             ReLU-10         [-1, 64, 108, 108]               0
        MaxPool2d-11           [-1, 64, 54, 54]               0
           Conv2d-12           [-1, 84, 52, 52]          48,468
      BatchNorm2d-13           [-1, 84, 52, 52]             168
             ReLU-14           [-1, 84,

In [12]:
# Load provided teacher model (model architecture: resnet18, num_classes=11, test-acc ~= 89.9%)
teacher_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=False, num_classes=11)
# load state dict
teacher_ckpt_path = os.path.join(cfg['dataset_root'], "resnet18_teacher.ckpt")
teacher_model.load_state_dict(torch.load(teacher_ckpt_path, map_location='cpu'))
# Now you already know the teacher model's architecture. You can take advantage of it if you want to pass the strong or boss baseline.
# Source code of resnet in pytorch: (https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py)
# You can also see the summary of teacher model. There are 11,182,155 parameters totally in the teacher model
# summary(teacher_model, (3, 224, 224), device='cpu')

Using cache found in /home/yuehpo/.cache/torch/hub/pytorch_vision_v0.10.0


<All keys matched successfully>

### Knowledge_Distillation

<img src="https://i.imgur.com/H2aF7Rv.png=100x" width="400px">

Since we have a learned big model, let it teach the other small model. In implementation, let the training target be the prediction of big model instead of the ground truth.

**Why it works?**
* If the data is not clean, then the prediction of big model could ignore the noise of the data with wrong labeled.
* There might have some relations between classes, so soft labels from teacher model might be useful. For example, Number 8 is more similar to 6, 9, 0 than 1, 7.


**How to implement?**
* $Loss = \alpha T^2 \times KL(p || q) + (1-\alpha)(\text{Original Cross Entropy Loss}), \text{where } p=softmax(\frac{\text{student's logits}}{T}), \text{and } q=softmax(\frac{\text{teacher's logits}}{T})$
* very useful link: [pytorch docs of KLDivLoss with examples](!https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html)
* original paper: [Distilling the Knowledge in a Neural Network](!https://arxiv.org/abs/1503.02531)

**Please be sure to carefully check each function's parameter requirements.**

In [13]:
# Implement the loss function with KL divergence loss for knowledge distillation.
# You also have to copy-paste this whole block to HW13 GradeScope.
def loss_fn_kd(student_logits, labels, teacher_logits, alpha=0.8, temperature=5.0):
    # ------------TODO-------------
    # Refer to the above formula and finish the loss function for knowkedge distillation using KL divergence loss and CE loss.
    # If you have no idea, please take a look at the provided useful link above.
    
    loss = nn.KLDivLoss(reduction="batchmean")(F.log_softmax(student_logits/temperature, dim=1), F.softmax(teacher_logits/temperature, dim=1)) * (alpha * temperature * temperature) + \
              nn.CrossEntropyLoss()(student_logits, labels) * (1. - alpha)
              
    return loss

In [14]:
# choose the loss function by the config
if cfg['loss_fn_type'] == 'CE':
    # For the classification task, we use cross-entropy as the default loss function.
    loss_fn = nn.CrossEntropyLoss() # loss function for simple baseline.

if cfg['loss_fn_type'] == 'KD': # KD stands for knowledge distillation
    loss_fn = loss_fn_kd # implement loss_fn_kd for the report question and the medium baseline.

# You can also adopt other types of knowledge distillation techniques for strong and boss baseline, but use function name other than `loss_fn_kd`
# For example:
# def loss_fn_custom_kd():
#     pass
# if cfg['loss_fn_type'] == 'custom_kd':
#     loss_fn = loss_fn_custom_kd

# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"
log(f"device: {device}")

# The number of training epochs and patience.
n_epochs = cfg['n_epochs']
patience = cfg['patience'] # If no improvement in 'patience' epochs, early stop

device: cuda


### Training
implement training loop for simple baseline, feel free to modify it.

In [15]:
# Initialize a model, and put it on the device specified.
student_model.to(device)
teacher_model.to(device) # MEDIUM BASELINE

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(student_model.parameters(), lr=cfg['lr'], weight_decay=cfg['weight_decay'])

# Initialize trackers, these are not parameters and should not be changed
stale = 0
best_acc = 0.0

teacher_model.eval()  # MEDIUM BASELINE
for epoch in range(n_epochs):

    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    student_model.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []
    train_lens = []

    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)
        #imgs = imgs.half()
        #print(imgs.shape,labels.shape)

        # Forward the data. (Make sure data and model are on the same device.)
        with torch.no_grad():  # MEDIUM BASELINE
            teacher_logits = teacher_model(imgs)  # MEDIUM BASELINE

        logits = student_model(imgs)

        # Calculate the cross-entropy loss.
        # We don't need to apply softmax before computing cross-entropy as it is done automatically.
        loss = loss_fn(logits, labels, teacher_logits) # MEDIUM BASELINE
        # loss = loss_fn(logits, labels) # SIMPLE BASELINE
        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(student_model.parameters(), max_norm=cfg['grad_norm_max'])

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().sum()

        # Record the loss and accuracy.
        train_batch_len = len(imgs)
        train_loss.append(loss.item() * train_batch_len)
        train_accs.append(acc)
        train_lens.append(train_batch_len)

    train_loss = sum(train_loss) / sum(train_lens)
    train_acc = sum(train_accs) / sum(train_lens)

    # Print the information.
    log(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    student_model.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []
    valid_lens = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = student_model(imgs)
            teacher_logits = teacher_model(imgs) # MEDIUM BASELINE

        # We can still compute the loss (but not the gradient).
        loss = loss_fn(logits, labels, teacher_logits) # MEDIUM BASELINE
        # loss = loss_fn(logits, labels) # SIMPLE BASELINE

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().sum()

        # Record the loss and accuracy.
        batch_len = len(imgs)
        valid_loss.append(loss.item() * batch_len)
        valid_accs.append(acc)
        valid_lens.append(batch_len)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / sum(valid_lens)
    valid_acc = sum(valid_accs) / sum(valid_lens)

    # update logs

    if valid_acc > best_acc:
        log(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        log(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        log(f"Best model found at epoch {epoch+1}, saving model")
        torch.save(student_model.state_dict(), f"{save_path}/student_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            log(f"No improvment {patience} consecutive epochs, early stopping")
            break
log("Finish training")
log_fw.close()

100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 001/600 ] loss = 7.18326, acc = 0.14600


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 001/600 ] loss = 25.59803, acc = 0.24526 -> best
Best model found at epoch 1, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 002/600 ] loss = 6.72797, acc = 0.16702


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 002/600 ] loss = 24.93359, acc = 0.26557 -> best
Best model found at epoch 2, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 003/600 ] loss = 6.66422, acc = 0.17052


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 003/600 ] loss = 24.67761, acc = 0.28317 -> best
Best model found at epoch 3, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 004/600 ] loss = 6.36221, acc = 0.17342


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 004/600 ] loss = 24.34791, acc = 0.27595


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 005/600 ] loss = 6.42483, acc = 0.17452


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 005/600 ] loss = 24.12571, acc = 0.27324


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 006/600 ] loss = 6.41617, acc = 0.17542


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 006/600 ] loss = 23.92019, acc = 0.28317


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 007/600 ] loss = 6.25719, acc = 0.17722


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 007/600 ] loss = 23.66627, acc = 0.28159


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 008/600 ] loss = 6.18463, acc = 0.18403


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 008/600 ] loss = 23.59371, acc = 0.27685


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 009/600 ] loss = 6.15947, acc = 0.18823


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 009/600 ] loss = 23.36805, acc = 0.29986 -> best
Best model found at epoch 9, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 010/600 ] loss = 6.16365, acc = 0.18583


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 010/600 ] loss = 23.16211, acc = 0.30957 -> best
Best model found at epoch 10, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 011/600 ] loss = 6.14123, acc = 0.18753


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 011/600 ] loss = 23.27576, acc = 0.26918


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 012/600 ] loss = 6.17218, acc = 0.19364


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 012/600 ] loss = 23.06712, acc = 0.28407


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 013/600 ] loss = 6.01541, acc = 0.19504


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 013/600 ] loss = 22.97628, acc = 0.32175 -> best
Best model found at epoch 13, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 014/600 ] loss = 5.98150, acc = 0.19974


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 014/600 ] loss = 22.99492, acc = 0.29310


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 015/600 ] loss = 5.93095, acc = 0.19724


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 015/600 ] loss = 22.76989, acc = 0.30235


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 016/600 ] loss = 5.88085, acc = 0.19404


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 016/600 ] loss = 23.00211, acc = 0.27189


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 017/600 ] loss = 5.84322, acc = 0.20404


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 017/600 ] loss = 22.45044, acc = 0.31092


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 018/600 ] loss = 5.97061, acc = 0.20134


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 018/600 ] loss = 22.54809, acc = 0.29558


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 019/600 ] loss = 6.01558, acc = 0.20544


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 019/600 ] loss = 22.38498, acc = 0.32107


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 020/600 ] loss = 5.91949, acc = 0.20935


100%|██████████| 18/18 [00:12<00:00,  1.38it/s]


[ Valid | 020/600 ] loss = 22.40176, acc = 0.34950 -> best
Best model found at epoch 20, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 021/600 ] loss = 5.90218, acc = 0.20725


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 021/600 ] loss = 22.43109, acc = 0.35086 -> best
Best model found at epoch 21, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 022/600 ] loss = 5.90831, acc = 0.20945


100%|██████████| 18/18 [00:13<00:00,  1.30it/s]


[ Valid | 022/600 ] loss = 22.11442, acc = 0.33912


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 023/600 ] loss = 5.76155, acc = 0.21155


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 023/600 ] loss = 22.11134, acc = 0.32649


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 024/600 ] loss = 5.92463, acc = 0.21175


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 024/600 ] loss = 21.84442, acc = 0.34702


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 025/600 ] loss = 5.83400, acc = 0.20715


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 025/600 ] loss = 21.97010, acc = 0.33281


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 026/600 ] loss = 5.74151, acc = 0.21665


100%|██████████| 18/18 [00:12<00:00,  1.38it/s]


[ Valid | 026/600 ] loss = 21.77918, acc = 0.35469 -> best
Best model found at epoch 26, saving model


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 027/600 ] loss = 5.80573, acc = 0.21665


100%|██████████| 18/18 [00:13<00:00,  1.30it/s]


[ Valid | 027/600 ] loss = 21.75338, acc = 0.34251


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 028/600 ] loss = 5.82274, acc = 0.21465


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 028/600 ] loss = 22.10351, acc = 0.35041


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 029/600 ] loss = 5.88083, acc = 0.22206


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 029/600 ] loss = 21.70455, acc = 0.33281


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 030/600 ] loss = 5.76971, acc = 0.21925


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 030/600 ] loss = 21.76080, acc = 0.36282 -> best
Best model found at epoch 30, saving model


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 031/600 ] loss = 5.68239, acc = 0.21365


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 031/600 ] loss = 21.52242, acc = 0.36755 -> best
Best model found at epoch 31, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 032/600 ] loss = 5.82797, acc = 0.21685


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 032/600 ] loss = 21.62448, acc = 0.35988


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 033/600 ] loss = 5.72879, acc = 0.22296


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 033/600 ] loss = 21.55832, acc = 0.37319 -> best
Best model found at epoch 33, saving model


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 034/600 ] loss = 5.72600, acc = 0.22316


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 034/600 ] loss = 21.39219, acc = 0.38267 -> best
Best model found at epoch 34, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 035/600 ] loss = 5.77377, acc = 0.22326


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 035/600 ] loss = 21.27380, acc = 0.37071


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 036/600 ] loss = 5.76424, acc = 0.22105


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 036/600 ] loss = 21.55057, acc = 0.35334


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 037/600 ] loss = 5.65187, acc = 0.22446


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 037/600 ] loss = 21.52371, acc = 0.36079


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 038/600 ] loss = 5.77596, acc = 0.22316


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 038/600 ] loss = 21.19291, acc = 0.35672


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 039/600 ] loss = 5.66101, acc = 0.22206


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 039/600 ] loss = 21.44761, acc = 0.35560


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 040/600 ] loss = 5.66448, acc = 0.22396


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 040/600 ] loss = 21.14591, acc = 0.36282


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 041/600 ] loss = 5.72651, acc = 0.22796


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 041/600 ] loss = 21.32381, acc = 0.33529


100%|██████████| 40/40 [00:58<00:00,  1.47s/it]


[ Train | 042/600 ] loss = 5.55474, acc = 0.22676


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 042/600 ] loss = 21.07823, acc = 0.37523


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 043/600 ] loss = 5.63874, acc = 0.22796


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 043/600 ] loss = 21.29688, acc = 0.31701


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 044/600 ] loss = 5.57993, acc = 0.22706


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 044/600 ] loss = 21.16449, acc = 0.37681


100%|██████████| 40/40 [00:58<00:00,  1.47s/it]


[ Train | 045/600 ] loss = 5.57711, acc = 0.22636


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 045/600 ] loss = 20.95639, acc = 0.36011


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 046/600 ] loss = 5.68991, acc = 0.23086


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 046/600 ] loss = 21.00888, acc = 0.37139


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 047/600 ] loss = 5.60971, acc = 0.22636


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 047/600 ] loss = 20.82571, acc = 0.38606 -> best
Best model found at epoch 47, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 048/600 ] loss = 5.74146, acc = 0.22536


100%|██████████| 18/18 [00:13<00:00,  1.30it/s]


[ Valid | 048/600 ] loss = 20.91001, acc = 0.37274


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 049/600 ] loss = 5.65952, acc = 0.23426


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 049/600 ] loss = 21.17535, acc = 0.32017


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 050/600 ] loss = 5.65608, acc = 0.23016


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 050/600 ] loss = 21.10249, acc = 0.36801


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 051/600 ] loss = 5.58024, acc = 0.23156


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 051/600 ] loss = 20.71891, acc = 0.38357


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 052/600 ] loss = 5.70352, acc = 0.22866


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 052/600 ] loss = 20.77548, acc = 0.38560


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 053/600 ] loss = 5.60001, acc = 0.22936


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 053/600 ] loss = 20.95149, acc = 0.36394


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 054/600 ] loss = 5.51051, acc = 0.22736


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 054/600 ] loss = 20.84893, acc = 0.39170 -> best
Best model found at epoch 54, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 055/600 ] loss = 5.53451, acc = 0.23046


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 055/600 ] loss = 20.49724, acc = 0.37207


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 056/600 ] loss = 5.49195, acc = 0.22876


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 056/600 ] loss = 20.68555, acc = 0.38335


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 057/600 ] loss = 5.54484, acc = 0.23156


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 057/600 ] loss = 20.45768, acc = 0.36575


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 058/600 ] loss = 5.57244, acc = 0.23757


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 058/600 ] loss = 20.68064, acc = 0.36033


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 059/600 ] loss = 5.60633, acc = 0.22876


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 059/600 ] loss = 20.45508, acc = 0.38470


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 060/600 ] loss = 5.47005, acc = 0.22746


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 060/600 ] loss = 20.28408, acc = 0.38786


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 061/600 ] loss = 5.62507, acc = 0.23747


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 061/600 ] loss = 20.69713, acc = 0.36304


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 062/600 ] loss = 5.58287, acc = 0.23396


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 062/600 ] loss = 20.17930, acc = 0.39869 -> best
Best model found at epoch 62, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 063/600 ] loss = 5.49912, acc = 0.23777


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 063/600 ] loss = 20.38863, acc = 0.36124


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 064/600 ] loss = 5.55277, acc = 0.23466


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 064/600 ] loss = 20.39675, acc = 0.37590


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 065/600 ] loss = 5.59053, acc = 0.22346


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 065/600 ] loss = 20.74358, acc = 0.38809


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 066/600 ] loss = 5.50686, acc = 0.23516


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 066/600 ] loss = 20.02201, acc = 0.40162 -> best
Best model found at epoch 66, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 067/600 ] loss = 5.49275, acc = 0.23546


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 067/600 ] loss = 20.22771, acc = 0.39463


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 068/600 ] loss = 5.50217, acc = 0.23476


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 068/600 ] loss = 20.43129, acc = 0.40569 -> best
Best model found at epoch 68, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 069/600 ] loss = 5.48109, acc = 0.23577


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 069/600 ] loss = 20.61866, acc = 0.36688


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 070/600 ] loss = 5.50756, acc = 0.23406


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 070/600 ] loss = 20.38154, acc = 0.38425


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 071/600 ] loss = 5.43305, acc = 0.23887


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 071/600 ] loss = 20.26491, acc = 0.36146


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 072/600 ] loss = 5.49589, acc = 0.23817


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 072/600 ] loss = 20.27564, acc = 0.36327


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 073/600 ] loss = 5.50286, acc = 0.24407


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 073/600 ] loss = 20.19286, acc = 0.39463


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 074/600 ] loss = 5.40954, acc = 0.24047


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 074/600 ] loss = 20.36979, acc = 0.38944


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 075/600 ] loss = 5.46185, acc = 0.23767


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 075/600 ] loss = 20.23318, acc = 0.39666


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 076/600 ] loss = 5.51324, acc = 0.24467


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 076/600 ] loss = 20.18461, acc = 0.39734


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 077/600 ] loss = 5.42522, acc = 0.23947


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 077/600 ] loss = 20.13105, acc = 0.39102


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 078/600 ] loss = 5.50541, acc = 0.24187


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 078/600 ] loss = 20.02597, acc = 0.40388


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 079/600 ] loss = 5.45617, acc = 0.24017


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 079/600 ] loss = 19.87317, acc = 0.40162


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 080/600 ] loss = 5.47543, acc = 0.24647


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 080/600 ] loss = 20.53792, acc = 0.37004


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 081/600 ] loss = 5.48503, acc = 0.24177


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 081/600 ] loss = 19.82420, acc = 0.41065 -> best
Best model found at epoch 81, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 082/600 ] loss = 5.35693, acc = 0.23577


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 082/600 ] loss = 19.81583, acc = 0.39824


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 083/600 ] loss = 5.33837, acc = 0.23727


100%|██████████| 18/18 [00:12<00:00,  1.38it/s]


[ Valid | 083/600 ] loss = 20.20692, acc = 0.41539 -> best
Best model found at epoch 83, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 084/600 ] loss = 5.40571, acc = 0.24687


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 084/600 ] loss = 19.73455, acc = 0.41291


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 085/600 ] loss = 5.34046, acc = 0.24547


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 085/600 ] loss = 19.86797, acc = 0.39779


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 086/600 ] loss = 5.42574, acc = 0.24247


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 086/600 ] loss = 19.79594, acc = 0.41020


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 087/600 ] loss = 5.44256, acc = 0.23987


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 087/600 ] loss = 19.78433, acc = 0.41810 -> best
Best model found at epoch 87, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 088/600 ] loss = 5.46833, acc = 0.23877


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 088/600 ] loss = 19.89059, acc = 0.39012


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 089/600 ] loss = 5.38603, acc = 0.24237


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 089/600 ] loss = 19.92810, acc = 0.42351 -> best
Best model found at epoch 89, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 090/600 ] loss = 5.33837, acc = 0.24107


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 090/600 ] loss = 19.82118, acc = 0.39328


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 091/600 ] loss = 5.30411, acc = 0.24177


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 091/600 ] loss = 19.88790, acc = 0.41764


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 092/600 ] loss = 5.33821, acc = 0.24647


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 092/600 ] loss = 19.73038, acc = 0.40839


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 093/600 ] loss = 5.42223, acc = 0.24847


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 093/600 ] loss = 19.82554, acc = 0.40027


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 094/600 ] loss = 5.34485, acc = 0.24037


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 094/600 ] loss = 19.66010, acc = 0.40523


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 095/600 ] loss = 5.39172, acc = 0.24587


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 095/600 ] loss = 19.59261, acc = 0.40275


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 096/600 ] loss = 5.45396, acc = 0.24587


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 096/600 ] loss = 19.62462, acc = 0.40253


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 097/600 ] loss = 5.38397, acc = 0.24777


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 097/600 ] loss = 19.90439, acc = 0.41268


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 098/600 ] loss = 5.37542, acc = 0.24427


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 098/600 ] loss = 19.62836, acc = 0.42486 -> best
Best model found at epoch 98, saving model


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 099/600 ] loss = 5.30326, acc = 0.24407


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 099/600 ] loss = 19.65341, acc = 0.38899


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 100/600 ] loss = 5.38438, acc = 0.24537


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 100/600 ] loss = 19.63780, acc = 0.41268


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 101/600 ] loss = 5.36849, acc = 0.25078


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 101/600 ] loss = 19.64327, acc = 0.41088


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 102/600 ] loss = 5.35808, acc = 0.25268


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 102/600 ] loss = 19.46228, acc = 0.41449


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 103/600 ] loss = 5.38255, acc = 0.24947


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 103/600 ] loss = 19.57961, acc = 0.38696


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 104/600 ] loss = 5.31791, acc = 0.24437


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 104/600 ] loss = 19.65382, acc = 0.40208


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 105/600 ] loss = 5.28394, acc = 0.24467


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 105/600 ] loss = 19.58906, acc = 0.41042


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 106/600 ] loss = 5.34206, acc = 0.24907


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 106/600 ] loss = 19.70540, acc = 0.39869


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 107/600 ] loss = 5.33013, acc = 0.25068


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 107/600 ] loss = 19.60434, acc = 0.41403


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 108/600 ] loss = 5.26191, acc = 0.24787


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 108/600 ] loss = 19.68873, acc = 0.42780 -> best
Best model found at epoch 108, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 109/600 ] loss = 5.28388, acc = 0.25238


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 109/600 ] loss = 19.96834, acc = 0.38448


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 110/600 ] loss = 5.28592, acc = 0.24867


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 110/600 ] loss = 19.29649, acc = 0.39734


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 111/600 ] loss = 5.25985, acc = 0.25218


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 111/600 ] loss = 19.40360, acc = 0.41764


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 112/600 ] loss = 5.32445, acc = 0.25228


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 112/600 ] loss = 19.72759, acc = 0.41697


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 113/600 ] loss = 5.36348, acc = 0.24627


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 113/600 ] loss = 19.30972, acc = 0.41855


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 114/600 ] loss = 5.35730, acc = 0.25598


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 114/600 ] loss = 19.43687, acc = 0.41110


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 115/600 ] loss = 5.36320, acc = 0.25268


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 115/600 ] loss = 19.55190, acc = 0.42938 -> best
Best model found at epoch 115, saving model


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 116/600 ] loss = 5.27150, acc = 0.24987


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 116/600 ] loss = 19.33592, acc = 0.41810


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 117/600 ] loss = 5.18813, acc = 0.24957


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 117/600 ] loss = 19.59258, acc = 0.39666


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 118/600 ] loss = 5.30597, acc = 0.24977


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 118/600 ] loss = 19.22132, acc = 0.41900


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 119/600 ] loss = 5.21092, acc = 0.25418


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 119/600 ] loss = 19.41300, acc = 0.42261


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 120/600 ] loss = 5.51482, acc = 0.25458


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 120/600 ] loss = 19.34598, acc = 0.40343


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 121/600 ] loss = 5.27133, acc = 0.25538


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 121/600 ] loss = 19.45892, acc = 0.42554


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 122/600 ] loss = 5.31677, acc = 0.25398


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 122/600 ] loss = 19.67819, acc = 0.37387


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 123/600 ] loss = 5.32048, acc = 0.25008


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 123/600 ] loss = 18.96821, acc = 0.43592 -> best
Best model found at epoch 123, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 124/600 ] loss = 5.26203, acc = 0.24897


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 124/600 ] loss = 19.19794, acc = 0.43366


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 125/600 ] loss = 5.26523, acc = 0.25398


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 125/600 ] loss = 19.35626, acc = 0.42261


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 126/600 ] loss = 5.22975, acc = 0.25038


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 126/600 ] loss = 19.18406, acc = 0.42238


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 127/600 ] loss = 5.38530, acc = 0.25568


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 127/600 ] loss = 19.18691, acc = 0.42554


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 128/600 ] loss = 5.29810, acc = 0.25568


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 128/600 ] loss = 19.09886, acc = 0.42712


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 129/600 ] loss = 5.27827, acc = 0.25678


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 129/600 ] loss = 18.95805, acc = 0.42351


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 130/600 ] loss = 5.26600, acc = 0.24797


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 130/600 ] loss = 19.28378, acc = 0.39621


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 131/600 ] loss = 5.31753, acc = 0.24787


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 131/600 ] loss = 19.01801, acc = 0.42893


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 132/600 ] loss = 5.20773, acc = 0.25738


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 132/600 ] loss = 19.30103, acc = 0.42893


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 133/600 ] loss = 5.34383, acc = 0.25038


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 133/600 ] loss = 19.19870, acc = 0.43773 -> best
Best model found at epoch 133, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 134/600 ] loss = 5.22061, acc = 0.26038


100%|██████████| 18/18 [00:12<00:00,  1.38it/s]


[ Valid | 134/600 ] loss = 19.02771, acc = 0.42915


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 135/600 ] loss = 5.29723, acc = 0.25728


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 135/600 ] loss = 19.09713, acc = 0.41697


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 136/600 ] loss = 5.20189, acc = 0.25598


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 136/600 ] loss = 18.85173, acc = 0.43434


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 137/600 ] loss = 5.24300, acc = 0.25958


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 137/600 ] loss = 19.04370, acc = 0.40095


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 138/600 ] loss = 5.27936, acc = 0.26158


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 138/600 ] loss = 19.58645, acc = 0.43885 -> best
Best model found at epoch 138, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 139/600 ] loss = 5.18239, acc = 0.25738


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 139/600 ] loss = 18.99672, acc = 0.44810 -> best
Best model found at epoch 139, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 140/600 ] loss = 5.20470, acc = 0.25578


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 140/600 ] loss = 19.10081, acc = 0.43118


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 141/600 ] loss = 5.14807, acc = 0.26128


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 141/600 ] loss = 19.06901, acc = 0.44856 -> best
Best model found at epoch 141, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 142/600 ] loss = 5.11694, acc = 0.25788


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 142/600 ] loss = 18.89020, acc = 0.43412


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 143/600 ] loss = 5.20374, acc = 0.26158


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 143/600 ] loss = 19.26243, acc = 0.43637


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 144/600 ] loss = 5.10168, acc = 0.26008


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 144/600 ] loss = 18.90416, acc = 0.43998


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 145/600 ] loss = 5.11434, acc = 0.25958


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 145/600 ] loss = 18.92952, acc = 0.43276


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 146/600 ] loss = 5.17670, acc = 0.25978


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 146/600 ] loss = 18.95914, acc = 0.41471


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 147/600 ] loss = 5.22234, acc = 0.26088


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 147/600 ] loss = 19.46552, acc = 0.43321


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 148/600 ] loss = 5.21522, acc = 0.25898


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 148/600 ] loss = 18.71795, acc = 0.44134


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 149/600 ] loss = 5.16712, acc = 0.26418


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 149/600 ] loss = 19.08997, acc = 0.42622


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 150/600 ] loss = 5.15148, acc = 0.26298


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 150/600 ] loss = 18.52796, acc = 0.45171 -> best
Best model found at epoch 150, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 151/600 ] loss = 5.12236, acc = 0.26388


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 151/600 ] loss = 18.73429, acc = 0.43727


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 152/600 ] loss = 5.15833, acc = 0.25768


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 152/600 ] loss = 19.10255, acc = 0.41629


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 153/600 ] loss = 5.22634, acc = 0.25738


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 153/600 ] loss = 18.92540, acc = 0.42441


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 154/600 ] loss = 5.19196, acc = 0.26719


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 154/600 ] loss = 18.87881, acc = 0.42735


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 155/600 ] loss = 5.14133, acc = 0.25858


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 155/600 ] loss = 18.75781, acc = 0.41561


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 156/600 ] loss = 5.14103, acc = 0.26348


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 156/600 ] loss = 18.94368, acc = 0.40614


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 157/600 ] loss = 5.15062, acc = 0.26589


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 157/600 ] loss = 19.00297, acc = 0.45329 -> best
Best model found at epoch 157, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 158/600 ] loss = 5.20452, acc = 0.26238


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 158/600 ] loss = 18.56174, acc = 0.44472


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 159/600 ] loss = 5.08978, acc = 0.27119


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 159/600 ] loss = 18.72293, acc = 0.43795


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 160/600 ] loss = 5.17624, acc = 0.26348


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 160/600 ] loss = 18.82969, acc = 0.42419


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 161/600 ] loss = 5.20560, acc = 0.26078


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 161/600 ] loss = 18.79819, acc = 0.41945


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 162/600 ] loss = 5.13675, acc = 0.26358


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 162/600 ] loss = 18.89920, acc = 0.45104


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 163/600 ] loss = 5.22294, acc = 0.27169


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 163/600 ] loss = 18.86627, acc = 0.42622


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 164/600 ] loss = 5.11549, acc = 0.26869


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 164/600 ] loss = 18.67944, acc = 0.45736 -> best
Best model found at epoch 164, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 165/600 ] loss = 5.17657, acc = 0.26198


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 165/600 ] loss = 18.68395, acc = 0.45171


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 166/600 ] loss = 5.09265, acc = 0.26408


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 166/600 ] loss = 18.62889, acc = 0.43457


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 167/600 ] loss = 5.07728, acc = 0.26539


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 167/600 ] loss = 19.08167, acc = 0.40185


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 168/600 ] loss = 5.13908, acc = 0.26739


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 168/600 ] loss = 18.65984, acc = 0.44788


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 169/600 ] loss = 5.09159, acc = 0.26569


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 169/600 ] loss = 18.67756, acc = 0.45871 -> best
Best model found at epoch 169, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 170/600 ] loss = 5.22525, acc = 0.26829


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 170/600 ] loss = 18.58091, acc = 0.43795


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 171/600 ] loss = 5.18819, acc = 0.26348


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 171/600 ] loss = 18.52273, acc = 0.45284


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 172/600 ] loss = 5.12067, acc = 0.26549


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 172/600 ] loss = 18.72351, acc = 0.46006 -> best
Best model found at epoch 172, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 173/600 ] loss = 5.14189, acc = 0.27429


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 173/600 ] loss = 18.42603, acc = 0.44585


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 174/600 ] loss = 5.16884, acc = 0.26569


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 174/600 ] loss = 18.75267, acc = 0.40523


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 175/600 ] loss = 5.05152, acc = 0.26929


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 175/600 ] loss = 18.60249, acc = 0.41561


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 176/600 ] loss = 5.11737, acc = 0.26999


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 176/600 ] loss = 18.38545, acc = 0.46322 -> best
Best model found at epoch 176, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 177/600 ] loss = 5.10071, acc = 0.27359


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 177/600 ] loss = 18.38692, acc = 0.46142


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 178/600 ] loss = 5.12177, acc = 0.27289


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 178/600 ] loss = 18.68723, acc = 0.42148


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 179/600 ] loss = 5.09396, acc = 0.27119


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 179/600 ] loss = 18.34535, acc = 0.44607


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 180/600 ] loss = 5.12442, acc = 0.27299


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 180/600 ] loss = 18.58103, acc = 0.44021


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 181/600 ] loss = 5.12538, acc = 0.27099


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 181/600 ] loss = 18.78161, acc = 0.42532


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 182/600 ] loss = 5.16991, acc = 0.27169


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 182/600 ] loss = 18.77444, acc = 0.45397


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 183/600 ] loss = 5.11847, acc = 0.27219


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 183/600 ] loss = 18.51272, acc = 0.43592


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 184/600 ] loss = 5.12553, acc = 0.26769


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 184/600 ] loss = 18.37944, acc = 0.42667


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 185/600 ] loss = 4.99059, acc = 0.26509


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 185/600 ] loss = 18.76694, acc = 0.43186


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 186/600 ] loss = 5.12222, acc = 0.26869


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 186/600 ] loss = 18.39991, acc = 0.44878


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 187/600 ] loss = 5.04871, acc = 0.27759


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 187/600 ] loss = 18.24391, acc = 0.47563 -> best
Best model found at epoch 187, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 188/600 ] loss = 5.14634, acc = 0.26869


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 188/600 ] loss = 18.50717, acc = 0.44337


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 189/600 ] loss = 5.08374, acc = 0.27389


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 189/600 ] loss = 18.60483, acc = 0.43750


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 190/600 ] loss = 5.03449, acc = 0.27599


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 190/600 ] loss = 18.26496, acc = 0.45465


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 191/600 ] loss = 5.08929, acc = 0.26999


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 191/600 ] loss = 18.69600, acc = 0.40478


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 192/600 ] loss = 5.01166, acc = 0.26879


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 192/600 ] loss = 18.39632, acc = 0.44856


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 193/600 ] loss = 5.12976, acc = 0.27369


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 193/600 ] loss = 18.24664, acc = 0.48218 -> best
Best model found at epoch 193, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 194/600 ] loss = 5.03250, acc = 0.28280


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 194/600 ] loss = 18.24774, acc = 0.46503


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 195/600 ] loss = 5.06907, acc = 0.27029


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 195/600 ] loss = 18.25154, acc = 0.47383


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 196/600 ] loss = 5.05993, acc = 0.27609


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 196/600 ] loss = 18.50650, acc = 0.45059


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 197/600 ] loss = 5.03140, acc = 0.27609


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 197/600 ] loss = 18.19510, acc = 0.44111


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 198/600 ] loss = 5.02403, acc = 0.27399


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 198/600 ] loss = 18.29111, acc = 0.43840


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 199/600 ] loss = 5.05446, acc = 0.27699


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 199/600 ] loss = 18.37241, acc = 0.45600


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 200/600 ] loss = 5.10108, acc = 0.27389


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 200/600 ] loss = 18.70981, acc = 0.40456


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 201/600 ] loss = 5.10915, acc = 0.27059


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 201/600 ] loss = 19.56036, acc = 0.38290


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 202/600 ] loss = 4.98151, acc = 0.27459


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 202/600 ] loss = 18.03428, acc = 0.44810


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 203/600 ] loss = 5.03297, acc = 0.28260


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 203/600 ] loss = 18.04241, acc = 0.47022


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 204/600 ] loss = 5.09071, acc = 0.27039


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 204/600 ] loss = 18.20893, acc = 0.44495


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 205/600 ] loss = 5.05580, acc = 0.27539


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 205/600 ] loss = 18.21757, acc = 0.46931


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 206/600 ] loss = 5.03204, acc = 0.28040


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 206/600 ] loss = 18.02303, acc = 0.44111


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 207/600 ] loss = 5.00191, acc = 0.27980


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 207/600 ] loss = 18.04444, acc = 0.44878


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 208/600 ] loss = 4.99637, acc = 0.27669


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 208/600 ] loss = 18.06961, acc = 0.44968


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 209/600 ] loss = 4.95089, acc = 0.27870


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 209/600 ] loss = 18.25412, acc = 0.44901


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 210/600 ] loss = 5.03612, acc = 0.27779


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 210/600 ] loss = 18.02926, acc = 0.46390


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 211/600 ] loss = 5.04186, acc = 0.27860


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 211/600 ] loss = 18.01564, acc = 0.42441


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 212/600 ] loss = 5.07408, acc = 0.27469


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 212/600 ] loss = 18.14156, acc = 0.44968


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 213/600 ] loss = 4.98326, acc = 0.28190


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 213/600 ] loss = 17.91959, acc = 0.48646 -> best
Best model found at epoch 213, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 214/600 ] loss = 5.04119, acc = 0.28040


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 214/600 ] loss = 18.24833, acc = 0.47134


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 215/600 ] loss = 5.08393, acc = 0.27799


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 215/600 ] loss = 17.98117, acc = 0.48127


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 216/600 ] loss = 5.13993, acc = 0.28250


100%|██████████| 18/18 [00:12<00:00,  1.41it/s]


[ Valid | 216/600 ] loss = 17.70429, acc = 0.46367


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 217/600 ] loss = 4.96351, acc = 0.27880


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 217/600 ] loss = 18.13693, acc = 0.48263


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 218/600 ] loss = 5.03513, acc = 0.28260


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 218/600 ] loss = 17.85509, acc = 0.46029


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 219/600 ] loss = 4.95470, acc = 0.27970


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 219/600 ] loss = 17.97849, acc = 0.46616


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 220/600 ] loss = 5.04444, acc = 0.27839


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 220/600 ] loss = 17.87712, acc = 0.46209


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 221/600 ] loss = 5.07508, acc = 0.28120


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 221/600 ] loss = 18.07330, acc = 0.45217


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 222/600 ] loss = 4.99911, acc = 0.28110


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 222/600 ] loss = 17.93554, acc = 0.47856


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 223/600 ] loss = 4.96989, acc = 0.27189


100%|██████████| 18/18 [00:13<00:00,  1.31it/s]


[ Valid | 223/600 ] loss = 18.39255, acc = 0.42802


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 224/600 ] loss = 5.01014, acc = 0.27749


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 224/600 ] loss = 17.79581, acc = 0.47744


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 225/600 ] loss = 4.97751, acc = 0.27729


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 225/600 ] loss = 18.33696, acc = 0.44043


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 226/600 ] loss = 5.00183, acc = 0.27849


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 226/600 ] loss = 18.15389, acc = 0.44517


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 227/600 ] loss = 5.04459, acc = 0.28400


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 227/600 ] loss = 17.74708, acc = 0.47811


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 228/600 ] loss = 4.89850, acc = 0.27970


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 228/600 ] loss = 18.00314, acc = 0.45510


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 229/600 ] loss = 4.91074, acc = 0.27679


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 229/600 ] loss = 18.39497, acc = 0.41584


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 230/600 ] loss = 5.08675, acc = 0.27809


100%|██████████| 18/18 [00:13<00:00,  1.30it/s]


[ Valid | 230/600 ] loss = 18.37426, acc = 0.41674


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 231/600 ] loss = 5.00322, acc = 0.27659


100%|██████████| 18/18 [00:13<00:00,  1.30it/s]


[ Valid | 231/600 ] loss = 17.92244, acc = 0.47856


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 232/600 ] loss = 4.95941, acc = 0.28790


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 232/600 ] loss = 17.93337, acc = 0.46909


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 233/600 ] loss = 5.05661, acc = 0.28530


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 233/600 ] loss = 19.17051, acc = 0.31995


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 234/600 ] loss = 4.99143, acc = 0.27990


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 234/600 ] loss = 17.58550, acc = 0.48218


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 235/600 ] loss = 4.98868, acc = 0.28210


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 235/600 ] loss = 18.05673, acc = 0.46390


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 236/600 ] loss = 4.94910, acc = 0.28650


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 236/600 ] loss = 18.02744, acc = 0.46548


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 237/600 ] loss = 5.04520, acc = 0.28090


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 237/600 ] loss = 17.90813, acc = 0.44856


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 238/600 ] loss = 4.84622, acc = 0.28620


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 238/600 ] loss = 17.68519, acc = 0.47292


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 239/600 ] loss = 4.89583, acc = 0.28570


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 239/600 ] loss = 17.94597, acc = 0.44201


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 240/600 ] loss = 4.95441, acc = 0.28300


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 240/600 ] loss = 18.30555, acc = 0.44382


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 241/600 ] loss = 4.95619, acc = 0.29040


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 241/600 ] loss = 17.60373, acc = 0.48579


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 242/600 ] loss = 5.02402, acc = 0.28140


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 242/600 ] loss = 18.06941, acc = 0.48759 -> best
Best model found at epoch 242, saving model


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 243/600 ] loss = 4.89978, acc = 0.28220


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 243/600 ] loss = 18.50894, acc = 0.41042


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 244/600 ] loss = 4.98766, acc = 0.28580


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 244/600 ] loss = 17.90467, acc = 0.49007 -> best
Best model found at epoch 244, saving model


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 245/600 ] loss = 5.02668, acc = 0.28780


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 245/600 ] loss = 17.91110, acc = 0.47157


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 246/600 ] loss = 4.94751, acc = 0.28080


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 246/600 ] loss = 17.75587, acc = 0.48150


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 247/600 ] loss = 4.91885, acc = 0.29170


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 247/600 ] loss = 17.92533, acc = 0.45713


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 248/600 ] loss = 4.93475, acc = 0.29311


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 248/600 ] loss = 17.45600, acc = 0.50361 -> best
Best model found at epoch 248, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 249/600 ] loss = 4.86020, acc = 0.28240


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 249/600 ] loss = 17.68190, acc = 0.47315


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 250/600 ] loss = 4.94897, acc = 0.28390


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 250/600 ] loss = 17.80288, acc = 0.44359


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 251/600 ] loss = 4.98329, acc = 0.28390


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 251/600 ] loss = 17.54833, acc = 0.48285


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 252/600 ] loss = 4.87381, acc = 0.28850


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 252/600 ] loss = 17.53658, acc = 0.47969


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 253/600 ] loss = 4.86137, acc = 0.28870


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 253/600 ] loss = 17.97535, acc = 0.45194


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 254/600 ] loss = 4.93531, acc = 0.28550


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 254/600 ] loss = 17.65304, acc = 0.48127


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 255/600 ] loss = 4.89061, acc = 0.28870


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 255/600 ] loss = 17.62857, acc = 0.49188


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 256/600 ] loss = 4.86370, acc = 0.28560


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 256/600 ] loss = 18.23888, acc = 0.42148


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 257/600 ] loss = 4.85890, acc = 0.29260


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 257/600 ] loss = 17.63815, acc = 0.46683


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 258/600 ] loss = 4.86178, acc = 0.28560


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 258/600 ] loss = 17.80612, acc = 0.48714


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 259/600 ] loss = 4.87127, acc = 0.29601


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 259/600 ] loss = 17.85988, acc = 0.44968


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 260/600 ] loss = 4.84394, acc = 0.29010


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 260/600 ] loss = 17.60653, acc = 0.48872


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 261/600 ] loss = 4.86589, acc = 0.29301


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 261/600 ] loss = 17.68728, acc = 0.50677 -> best
Best model found at epoch 261, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 262/600 ] loss = 4.95320, acc = 0.28760


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 262/600 ] loss = 17.85504, acc = 0.41832


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 263/600 ] loss = 4.90220, acc = 0.28980


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 263/600 ] loss = 17.48756, acc = 0.49594


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 264/600 ] loss = 4.89111, acc = 0.28570


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 264/600 ] loss = 17.71159, acc = 0.49255


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 265/600 ] loss = 4.90502, acc = 0.29180


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 265/600 ] loss = 17.87740, acc = 0.43479


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 266/600 ] loss = 4.95969, acc = 0.28220


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 266/600 ] loss = 17.65009, acc = 0.49413


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 267/600 ] loss = 4.82740, acc = 0.28550


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 267/600 ] loss = 17.33410, acc = 0.47586


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 268/600 ] loss = 4.94550, acc = 0.29220


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 268/600 ] loss = 17.39872, acc = 0.46119


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 269/600 ] loss = 4.93100, acc = 0.29601


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 269/600 ] loss = 17.42700, acc = 0.47202


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 270/600 ] loss = 4.83972, acc = 0.28910


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 270/600 ] loss = 17.66102, acc = 0.50023


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 271/600 ] loss = 4.92981, acc = 0.29521


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 271/600 ] loss = 17.51237, acc = 0.46954


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 272/600 ] loss = 4.90283, acc = 0.29240


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 272/600 ] loss = 17.47319, acc = 0.48759


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 273/600 ] loss = 4.96412, acc = 0.29621


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 273/600 ] loss = 17.42944, acc = 0.47969


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 274/600 ] loss = 4.88665, acc = 0.29180


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 274/600 ] loss = 17.52668, acc = 0.47676


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 275/600 ] loss = 4.90890, acc = 0.29471


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 275/600 ] loss = 17.75274, acc = 0.45826


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 276/600 ] loss = 4.92297, acc = 0.29791


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 276/600 ] loss = 17.36066, acc = 0.48714


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 277/600 ] loss = 4.86482, acc = 0.29311


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 277/600 ] loss = 17.45687, acc = 0.45645


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 278/600 ] loss = 4.89720, acc = 0.28520


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 278/600 ] loss = 17.27125, acc = 0.50158


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 279/600 ] loss = 4.88203, acc = 0.29230


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 279/600 ] loss = 17.71589, acc = 0.48849


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 280/600 ] loss = 4.78783, acc = 0.29070


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 280/600 ] loss = 17.34560, acc = 0.46255


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 281/600 ] loss = 4.77911, acc = 0.29020


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 281/600 ] loss = 17.34157, acc = 0.50000


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 282/600 ] loss = 4.91375, acc = 0.29491


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 282/600 ] loss = 17.42007, acc = 0.46864


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 283/600 ] loss = 4.83833, acc = 0.29901


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 283/600 ] loss = 17.35246, acc = 0.46773


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 284/600 ] loss = 4.83731, acc = 0.29541


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 284/600 ] loss = 17.54269, acc = 0.43931


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 285/600 ] loss = 4.82558, acc = 0.29471


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 285/600 ] loss = 17.17787, acc = 0.48579


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 286/600 ] loss = 4.88856, acc = 0.29941


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 286/600 ] loss = 17.21385, acc = 0.48579


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 287/600 ] loss = 4.86351, acc = 0.29751


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 287/600 ] loss = 17.01251, acc = 0.52234 -> best
Best model found at epoch 287, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 288/600 ] loss = 4.86725, acc = 0.29110


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 288/600 ] loss = 17.03397, acc = 0.48466


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 289/600 ] loss = 4.89584, acc = 0.29811


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 289/600 ] loss = 17.56794, acc = 0.48894


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 290/600 ] loss = 4.88905, acc = 0.29361


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 290/600 ] loss = 17.14331, acc = 0.47022


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 291/600 ] loss = 4.95327, acc = 0.29791


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 291/600 ] loss = 17.02114, acc = 0.50023


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 292/600 ] loss = 4.82037, acc = 0.29571


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 292/600 ] loss = 17.24063, acc = 0.47270


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 293/600 ] loss = 4.83137, acc = 0.29391


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 293/600 ] loss = 17.43732, acc = 0.46977


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 294/600 ] loss = 4.80594, acc = 0.29611


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 294/600 ] loss = 17.09741, acc = 0.51173


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 295/600 ] loss = 4.85542, acc = 0.29661


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 295/600 ] loss = 17.78429, acc = 0.42577


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 296/600 ] loss = 4.86898, acc = 0.29441


100%|██████████| 18/18 [00:13<00:00,  1.33it/s]


[ Valid | 296/600 ] loss = 17.63056, acc = 0.50293


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 297/600 ] loss = 4.81513, acc = 0.29321


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 297/600 ] loss = 17.53203, acc = 0.46006


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 298/600 ] loss = 4.84190, acc = 0.30061


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 298/600 ] loss = 17.20496, acc = 0.47315


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 299/600 ] loss = 4.89784, acc = 0.29190


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 299/600 ] loss = 16.99444, acc = 0.47947


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 300/600 ] loss = 4.84476, acc = 0.30461


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 300/600 ] loss = 17.40612, acc = 0.47315


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 301/600 ] loss = 4.80183, acc = 0.29401


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 301/600 ] loss = 17.41965, acc = 0.46458


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 302/600 ] loss = 4.81605, acc = 0.29611


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 302/600 ] loss = 17.40445, acc = 0.46097


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 303/600 ] loss = 4.87088, acc = 0.29431


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 303/600 ] loss = 17.10599, acc = 0.47225


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 304/600 ] loss = 4.78647, acc = 0.29511


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 304/600 ] loss = 17.25424, acc = 0.49323


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 305/600 ] loss = 4.75831, acc = 0.29471


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 305/600 ] loss = 17.04996, acc = 0.50271


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 306/600 ] loss = 4.92561, acc = 0.29721


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 306/600 ] loss = 17.74932, acc = 0.48466


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 307/600 ] loss = 4.86756, acc = 0.29721


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 307/600 ] loss = 17.72285, acc = 0.48872


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 308/600 ] loss = 4.85582, acc = 0.29851


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 308/600 ] loss = 18.10579, acc = 0.47586


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 309/600 ] loss = 4.78542, acc = 0.29671


100%|██████████| 18/18 [00:13<00:00,  1.31it/s]


[ Valid | 309/600 ] loss = 17.06533, acc = 0.50361


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 310/600 ] loss = 4.72950, acc = 0.29501


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 310/600 ] loss = 17.13635, acc = 0.49052


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 311/600 ] loss = 4.82518, acc = 0.30101


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 311/600 ] loss = 17.51022, acc = 0.45871


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 312/600 ] loss = 4.84607, acc = 0.29270


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 312/600 ] loss = 16.78893, acc = 0.51241


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 313/600 ] loss = 4.74862, acc = 0.30962


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 313/600 ] loss = 17.20371, acc = 0.48172


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 314/600 ] loss = 4.82996, acc = 0.29811


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 314/600 ] loss = 17.05866, acc = 0.48150


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 315/600 ] loss = 4.84558, acc = 0.28970


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 315/600 ] loss = 16.84704, acc = 0.52076


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 316/600 ] loss = 4.80352, acc = 0.30271


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 316/600 ] loss = 17.63669, acc = 0.44698


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 317/600 ] loss = 4.89737, acc = 0.29501


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 317/600 ] loss = 17.01603, acc = 0.48127


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 318/600 ] loss = 4.85199, acc = 0.29791


100%|██████████| 18/18 [00:14<00:00,  1.28it/s]


[ Valid | 318/600 ] loss = 17.99590, acc = 0.41764


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 319/600 ] loss = 4.80832, acc = 0.30071


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 319/600 ] loss = 17.00649, acc = 0.50609


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 320/600 ] loss = 4.72133, acc = 0.29891


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 320/600 ] loss = 17.37387, acc = 0.47608


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 321/600 ] loss = 4.79507, acc = 0.29971


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 321/600 ] loss = 17.63389, acc = 0.45442


100%|██████████| 40/40 [00:58<00:00,  1.46s/it]


[ Train | 322/600 ] loss = 4.81493, acc = 0.30151


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 322/600 ] loss = 17.06039, acc = 0.49368


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 323/600 ] loss = 4.83309, acc = 0.30101


100%|██████████| 18/18 [00:12<00:00,  1.41it/s]


[ Valid | 323/600 ] loss = 16.99385, acc = 0.51670


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 324/600 ] loss = 4.77388, acc = 0.30051


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 324/600 ] loss = 16.70923, acc = 0.51918


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 325/600 ] loss = 4.76855, acc = 0.30551


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 325/600 ] loss = 16.93075, acc = 0.51218


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 326/600 ] loss = 4.83272, acc = 0.30061


100%|██████████| 18/18 [00:12<00:00,  1.41it/s]


[ Valid | 326/600 ] loss = 17.61699, acc = 0.42825


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 327/600 ] loss = 4.75561, acc = 0.29921


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 327/600 ] loss = 17.37287, acc = 0.49571


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 328/600 ] loss = 4.81326, acc = 0.30151


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 328/600 ] loss = 18.02699, acc = 0.41042


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 329/600 ] loss = 4.89044, acc = 0.29921


100%|██████████| 18/18 [00:13<00:00,  1.36it/s]


[ Valid | 329/600 ] loss = 16.92414, acc = 0.50587


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 330/600 ] loss = 4.79507, acc = 0.30451


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 330/600 ] loss = 16.95887, acc = 0.45758


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 331/600 ] loss = 4.77470, acc = 0.29581


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 331/600 ] loss = 17.07108, acc = 0.49413


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 332/600 ] loss = 4.72009, acc = 0.30521


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 332/600 ] loss = 16.79489, acc = 0.51737


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 333/600 ] loss = 4.76815, acc = 0.29911


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 333/600 ] loss = 16.83858, acc = 0.53655 -> best
Best model found at epoch 333, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 334/600 ] loss = 4.69754, acc = 0.30221


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 334/600 ] loss = 17.06664, acc = 0.50406


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 335/600 ] loss = 4.68678, acc = 0.30171


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 335/600 ] loss = 16.98807, acc = 0.47586


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 336/600 ] loss = 4.68580, acc = 0.29911


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 336/600 ] loss = 16.82974, acc = 0.50587


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 337/600 ] loss = 4.74134, acc = 0.30371


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 337/600 ] loss = 16.75078, acc = 0.52166


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 338/600 ] loss = 4.75727, acc = 0.31032


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 338/600 ] loss = 17.33433, acc = 0.43863


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 339/600 ] loss = 4.73651, acc = 0.29521


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 339/600 ] loss = 17.55438, acc = 0.46999


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 340/600 ] loss = 4.68688, acc = 0.30401


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 340/600 ] loss = 16.71020, acc = 0.50113


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 341/600 ] loss = 4.68683, acc = 0.29421


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 341/600 ] loss = 17.06050, acc = 0.48082


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 342/600 ] loss = 4.75561, acc = 0.30842


100%|██████████| 18/18 [00:13<00:00,  1.37it/s]


[ Valid | 342/600 ] loss = 16.68989, acc = 0.53881 -> best
Best model found at epoch 342, saving model


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 343/600 ] loss = 4.71717, acc = 0.30331


100%|██████████| 18/18 [00:13<00:00,  1.38it/s]


[ Valid | 343/600 ] loss = 17.12356, acc = 0.44607


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 344/600 ] loss = 4.79037, acc = 0.30431


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 344/600 ] loss = 16.89009, acc = 0.49571


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 345/600 ] loss = 4.68553, acc = 0.30701


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 345/600 ] loss = 16.66613, acc = 0.51038


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 346/600 ] loss = 4.81069, acc = 0.30952


100%|██████████| 18/18 [00:12<00:00,  1.39it/s]


[ Valid | 346/600 ] loss = 16.89369, acc = 0.50677


100%|██████████| 40/40 [00:58<00:00,  1.45s/it]


[ Train | 347/600 ] loss = 4.73619, acc = 0.30161


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 347/600 ] loss = 16.81174, acc = 0.49526


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 348/600 ] loss = 4.69966, acc = 0.30832


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 348/600 ] loss = 16.68742, acc = 0.49030


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 349/600 ] loss = 4.76984, acc = 0.30521


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 349/600 ] loss = 16.82012, acc = 0.51264


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 350/600 ] loss = 4.74280, acc = 0.29771


100%|██████████| 18/18 [00:13<00:00,  1.32it/s]


[ Valid | 350/600 ] loss = 16.61191, acc = 0.51782


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 351/600 ] loss = 4.76123, acc = 0.30752


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 351/600 ] loss = 16.48841, acc = 0.51986


100%|██████████| 40/40 [00:57<00:00,  1.44s/it]


[ Train | 352/600 ] loss = 4.65819, acc = 0.30381


100%|██████████| 18/18 [00:13<00:00,  1.34it/s]


[ Valid | 352/600 ] loss = 16.40971, acc = 0.51760


100%|██████████| 40/40 [00:57<00:00,  1.43s/it]


[ Train | 353/600 ] loss = 4.69662, acc = 0.30882


100%|██████████| 18/18 [00:12<00:00,  1.40it/s]


[ Valid | 353/600 ] loss = 17.15963, acc = 0.44314


100%|██████████| 40/40 [00:57<00:00,  1.45s/it]


[ Train | 354/600 ] loss = 4.74514, acc = 0.29891


100%|██████████| 18/18 [00:13<00:00,  1.35it/s]


[ Valid | 354/600 ] loss = 16.54823, acc = 0.51106


 85%|████████▌ | 34/40 [00:50<00:08,  1.50s/it]


KeyboardInterrupt: 

### Inference
load the best model of the experiment and generate submission.csv

In [None]:
# create dataloader for evaluation
eval_set = FoodDataset(os.path.join(cfg['dataset_root'], "evaluation"), tfm=test_tfm)
eval_loader = DataLoader(eval_set, batch_size=cfg['batch_size'], shuffle=False, num_workers=0, pin_memory=True)

One ./Food-11/evaluation sample ./Food-11/evaluation/0000.jpg


In [None]:
# Load model from {exp_name}/student_best.ckpt
student_model_best = get_student_model() # get a new student model to avoid reference before assignment.
ckpt_path = f"{save_path}/student_best.ckpt" # the ckpt path of the best student model.
student_model_best.load_state_dict(torch.load(ckpt_path, map_location='cpu')) # load the state dict and set it to the student model
student_model_best.to(device) # set the student model to device

# Start evaluate
student_model_best.eval()
eval_preds = [] # storing predictions of the evaluation dataset

# Iterate the validation set by batches.
for batch in tqdm(eval_loader):
    # A batch consists of image data and corresponding labels.
    imgs, _ = batch
    # We don't need gradient in evaluation.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = student_model_best(imgs.to(device))
        preds = list(logits.argmax(dim=-1).squeeze().cpu().numpy())
    # loss and acc can not be calculated because we do not have the true labels of the evaluation set.
    eval_preds += preds

def pad4(i):
    return "0"*(4-len(str(i))) + str(i)

# Save prediction results
ids = [pad4(i) for i in range(0,len(eval_set))]
categories = eval_preds

df = pd.DataFrame()
df['Id'] = ids
df['Category'] = categories
df.to_csv(f"{save_path}/submission.csv", index=False) # now you can download the submission.csv and upload it to the kaggle competition.

  0%|          | 0/35 [00:00<?, ?it/s]

100%|██████████| 35/35 [00:06<00:00,  5.48it/s]


> Don't forget to answer the report questions on GradeScope!