# Homework 13 - Network Compression

Author: Chen-Wei Ke (b08501098@ntu.edu.tw), modified from ML2022-HW13 (Liang-Hsuan Tseng)

If you have any questions, feel free to ask: mlta-2023-spring@googlegroups.com

[**Link to HW13 Slides**](https://docs.google.com/presentation/d/1QAVMbnabmmMNvmugPlHMg_GVKaYrKa6hoTSFeJl9OCs/edit?usp=sharing)

## Outline

* [Packages](#Packages) - intall some required packages.
* [Dataset](#Dataset) - something you need to know about the dataset.
* [Configs](#Configs) - the configs of the experiments, you can change some hyperparameters here.
* [Architecture_Design](#Architecture_Design) - depthwise and pointwise convolution examples and some useful links.
* [Knowledge_Distillation](#Knowledge_Distillation) - KL divergence loss for knowledge distillation and some useful links.
* [Training](#Training) - training loop implementation modified from HW3.
* [Inference](#Inference) - create submission.csv by using the student_best.ckpt from the previous experiment.



### Packages
First, we need to import some useful packages. If the torchsummary package are not intalled, please install it via `pip install torchsummary`

In [1]:
!pip install torchsummary

Collecting torchsummary
  Downloading torchsummary-1.5.1-py3-none-any.whl.metadata (296 bytes)
Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1


In [2]:
# Import some useful packages for this homework
import numpy as np
import pandas as pd
import torch
import os
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image
from torch.utils.data import ConcatDataset, DataLoader, Subset, Dataset # "ConcatDataset" and "Subset" are possibly useful
from torchvision.datasets import DatasetFolder, VisionDataset
from torchsummary import summary
from tqdm import tqdm
import random

# !nvidia-smi # list your current GPU

### Configs
In this part, you can specify some variables and hyperparameters as your configs.

In [3]:
cfg = {
    'dataset_root': '/kaggle/input/ml2023spring-hw13/Food-11',
    'save_dir': '/kaggle/working/',
    'exp_name': "simple_baseline",
    'batch_size': 64,
    'lr': 3e-4,
    'seed': 20220013,
    'loss_fn_type': 'KD', # simple baseline: CE, medium baseline: KD. See the Knowledge_Distillation part for more information.
    'weight_decay': 1e-5,
    'grad_norm_max': 10,
    'n_epochs': 700, # train more steps to pass the medium baseline.
    'patience': 300,
}

In [4]:
myseed = cfg['seed']  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
random.seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

save_path = os.path.join(cfg['save_dir'], cfg['exp_name']) # create saving directory
os.makedirs(save_path, exist_ok=True)

# define simple logging functionality
log_fw = open(f"{save_path}/log.txt", 'w') # open log file to save log outputs
def log(text):     # define a logging function to trace the training process
    print(text)
    log_fw.write(str(text)+'\n')
    log_fw.flush()

log(cfg)  # log your configs to the log file

{'dataset_root': '/kaggle/input/ml2023spring-hw13/Food-11', 'save_dir': '/kaggle/working/', 'exp_name': 'simple_baseline', 'batch_size': 64, 'lr': 0.0003, 'seed': 20220013, 'loss_fn_type': 'KD', 'weight_decay': 1e-05, 'grad_norm_max': 10, 'n_epochs': 700, 'patience': 300}


### Dataset
We use Food11 dataset for this homework, which is similar to homework3. But remember, Please DO NOT utilize the dataset of HW3. We've modified the dataset, so you should only access the dataset by loading it in this kaggle notebook or through the links provided in the HW13 colab notebooks.

In [5]:
# # fetch and download the dataset from github (about 1.12G)
# !wget -O Food-11.tar.gz https://www.dropbox.com/s/v97fi9xrwp9b964/food11-hw13.tar.gz?dl=0

In [6]:
# # extract the data
# !tar -xzf ./Food-11.tar.gz # Could take some time
# # !tar -xzvf ./Food-11.tar.gz # use this command if you want to checkout the whole process.

In [7]:
for dirname, _, filenames in os.walk('/kaggle/input/ml2023spring-hw13/Food-11'):
    if len(filenames) > 0:
        print(f"{dirname}: {len(filenames)} files.") # Show the file amounts in each split.

/kaggle/input/ml2023spring-hw13/Food-11: 1 files.
/kaggle/input/ml2023spring-hw13/Food-11/validation: 4432 files.
/kaggle/input/ml2023spring-hw13/Food-11/training: 9993 files.
/kaggle/input/ml2023spring-hw13/Food-11/evaluation: 2218 files.


Next, specify train/test transform for image data augmentation.
Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms. You can also apply the knowledge or experience you learned in HW3.

In [8]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# define training/testing transforms
test_tfm = transforms.Compose([
    # It is not encouraged to modify this part if you are using the provided teacher model. This transform is stardard and good enough for testing.
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
])

train_tfm = transforms.Compose([
    # add some useful transform or augmentation here, according to your experience in HW3.
    transforms.Resize(256),  # You can change this
    transforms.CenterCrop(224), # You can change this, but be aware of that the given teacher model's input size is 224.
    # The training input size of the provided teacher model is (3, 224, 224).
    # Thus, Input size other then 224 might hurt the performance. please be careful.
    transforms.RandomHorizontalFlip(), # You can change this.
    transforms.ToTensor(),
    normalize,
])

In [9]:
class FoodDataset(Dataset):
    def __init__(self, path, tfm=test_tfm, files = None):
        super().__init__()
        self.path = path
        self.files = sorted([os.path.join(path,x) for x in os.listdir(path) if x.endswith(".jpg")])
        if files != None:
            self.files = files
        print(f"One {path} sample",self.files[0])
        self.transform = tfm
  
    def __len__(self):
        return len(self.files)
  
    def __getitem__(self,idx):
        fname = self.files[idx]
        im = Image.open(fname)
        im = self.transform(im)
        try:
            label = int(fname.split("/")[-1].split("_")[0])
        except:
            label = -1 # test has no label
        return im,label

In [10]:
# Form train/valid dataloaders
train_set = FoodDataset(os.path.join(cfg['dataset_root'],"training"), tfm=train_tfm)
train_loader = DataLoader(train_set, batch_size=cfg['batch_size'], shuffle=True, num_workers=4, pin_memory=True)

valid_set = FoodDataset(os.path.join(cfg['dataset_root'], "validation"), tfm=test_tfm)
valid_loader = DataLoader(valid_set, batch_size=cfg['batch_size'], shuffle=False, num_workers=4, pin_memory=True)

One /kaggle/input/ml2023spring-hw13/Food-11/training sample /kaggle/input/ml2023spring-hw13/Food-11/training/0_0.jpg
One /kaggle/input/ml2023spring-hw13/Food-11/validation sample /kaggle/input/ml2023spring-hw13/Food-11/validation/0_0.jpg


### Architecture_Design

In this homework, you have to design a smaller network and make it perform well. Apparently, a well-designed architecture is crucial for such task. Here, we introduce the depthwise and pointwise convolution. These variants of convolution are some common techniques for architecture design when it comes to network compression.

<img src="https://i.imgur.com/LFDKHOp.png" width=400px>

* explanation of depthwise and pointwise convolutions:
    * [prof. Hung-yi Lee's slides(p.24~p.30, especially p.28)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/tiny_v7.pdf)

* other useful techniques
    * [group convolution](https://www.researchgate.net/figure/The-transformations-within-a-layer-in-DenseNets-left-and-CondenseNets-at-training-time_fig2_321325862) (Actually, depthwise convolution is a specific type of group convolution)
    * [SqueezeNet](!https://arxiv.org/abs/1602.07360)
    * [MobileNet](!https://arxiv.org/abs/1704.04861)
    * [ShuffleNet](!https://arxiv.org/abs/1707.01083)
    * [Xception](!https://arxiv.org/abs/1610.02357)
    * [GhostNet](!https://arxiv.org/abs/1911.11907)


After introducing depthwise and pointwise convolutions, let's define the **student network architecture**. Here, we have a very simple network formed by some regular convolution layers and pooling layers. You can replace the regular convolution layers with the depthwise and pointwise convolutions. In this way, you can further increase the depth or the width of your network architecture.

In [11]:
# Define your student network here. You have to copy-paste this code block to HW13 GradeScope before deadline.
# We will use your student network definition to evaluate your results(including the total parameter amount).

# Example implementation of Depthwise and Pointwise Convolution 
def dwpw_conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
    return nn.Sequential(
        nn.Conv2d(in_channels, in_channels, kernel_size, stride=stride, padding=padding, groups=in_channels), #depthwise convolution
        nn.Conv2d(in_channels, out_channels, 1), # pointwise convolution
    )

class StudentNet(nn.Module):
    def __init__(self):
      super().__init__()

      # ---------- TODO ----------
      # Modify your model architecture

      self.cnn = nn.Sequential(
        nn.Conv2d(3, 4, 3), 
        nn.BatchNorm2d(4),
        nn.ReLU(),    

        nn.Conv2d(4, 16, 3), 
        nn.BatchNorm2d(16),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),
        
        nn.Conv2d(16, 64, 3), 
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),
        
        nn.Conv2d(64, 84, 3), 
        nn.BatchNorm2d(84),
        nn.ReLU(),
        nn.MaxPool2d(2, 2, 0),
        
        # Here we adopt Global Average Pooling for various input size.
        nn.AdaptiveAvgPool2d((1, 1)),
      )
      self.fc = nn.Sequential(
        nn.Linear(84, 11),
      )
      
    def forward(self, x):
      out = self.cnn(x)
      out = out.view(out.size()[0], -1)
      return self.fc(out)

def get_student_model(): # This function should have no arguments so that we can get your student network by directly calling it.
    # you can modify or do anything here, just remember to return an nn.Module as your student network.  
    return StudentNet() 

# End of definition of your student model and the get_student_model API
# Please copy-paste the whole code block, including the get_student_model function.

After specifying the student network architecture, please use `torchsummary` package to get information about the network and verify the total number of parameters. Note that the total params of your student network should not exceed the limit (`Total params` in `torchsummary` ≤ 60,000). 

In [12]:
# DO NOT modify this block and please make sure that this block can run sucessfully. 
student_model = get_student_model()
summary(student_model, (3, 224, 224), device='cpu')
# You have to copy&paste the results of this block to HW13 GradeScope. 

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 4, 222, 222]             112
       BatchNorm2d-2          [-1, 4, 222, 222]               8
              ReLU-3          [-1, 4, 222, 222]               0
            Conv2d-4         [-1, 16, 220, 220]             592
       BatchNorm2d-5         [-1, 16, 220, 220]              32
              ReLU-6         [-1, 16, 220, 220]               0
         MaxPool2d-7         [-1, 16, 110, 110]               0
            Conv2d-8         [-1, 64, 108, 108]           9,280
       BatchNorm2d-9         [-1, 64, 108, 108]             128
             ReLU-10         [-1, 64, 108, 108]               0
        MaxPool2d-11           [-1, 64, 54, 54]               0
           Conv2d-12           [-1, 84, 52, 52]          48,468
      BatchNorm2d-13           [-1, 84, 52, 52]             168
             ReLU-14           [-1, 84,

In [13]:
# Load provided teacher model (model architecture: resnet18, num_classes=11, test-acc ~= 89.9%)
teacher_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=False, num_classes=11)
# load state dict
teacher_ckpt_path = os.path.join(cfg['dataset_root'], "resnet18_teacher.ckpt")
teacher_model.load_state_dict(torch.load(teacher_ckpt_path, map_location='cpu'))
# Now you already know the teacher model's architecture. You can take advantage of it if you want to pass the strong or boss baseline. 
# Source code of resnet in pytorch: (https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py)
# You can also see the summary of teacher model. There are 11,182,155 parameters totally in the teacher model
summary(teacher_model, (3, 224, 224), device='cpu')

Downloading: "https://github.com/pytorch/vision/zipball/v0.10.0" to /root/.cache/torch/hub/v0.10.0.zip


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

### Knowledge_Distillation

<img src="https://i.imgur.com/H2aF7Rv.png=100x" width="400px">

Since we have a learned big model, let it teach the other small model. In implementation, let the training target be the prediction of big model instead of the ground truth.

**Why it works?**
* If the data is not clean, then the prediction of big model could ignore the noise of the data with wrong labeled.
* There might have some relations between classes, so soft labels from teacher model might be useful. For example, Number 8 is more similar to 6, 9, 0 than 1, 7.


**How to implement?**
* $Loss = \alpha T^2 \times KL(p || q) + (1-\alpha)(\text{Original Cross Entropy Loss}), \text{where } p=softmax(\frac{\text{student's logits}}{T}), \text{and } q=softmax(\frac{\text{teacher's logits}}{T})$
* very useful link: [pytorch docs of KLDivLoss with examples](!https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html)
* original paper: [Distilling the Knowledge in a Neural Network](!https://arxiv.org/abs/1503.02531)

**Please be sure to carefully check each function's parameter requirements.**

In [14]:
# Implement the loss function with KL divergence loss for knowledge distillation.
# You also have to copy-paste this whole block to HW13 GradeScope. 
CE = nn.CrossEntropyLoss()
def loss_fn_kd(student_logits, labels, teacher_logits, alpha=0.5, temperature=20.0):
    # ------------TODO-------------
    # Refer to the above formula and finish the loss function for knowkedge distillation using KL divergence loss and CE loss.
    # If you have no idea, please take a look at the provided useful link above.
    loss_ce = F.cross_entropy(student_logits, labels)
    p = F.log_softmax(student_logits / temperature, dim=1)
    q = F.softmax(teacher_logits / temperature, dim=1)
    loss_kl = F.kl_div(p, q, reduction='batchmean', log_target=False)
    loss = alpha * temperature * temperature * loss_kl + (1 - alpha) * loss_ce
    return loss
#     student_T = (student_logits/temperature).softmax(dim=-1)
#     teacher_T = (teacher_logits/temperature).softmax(dim=-1)
#     kl_loss = (teacher_T*(teacher_T.log() - student_T.log())).sum(1).mean() 
#     ce_loss = CE(student_logits, labels)
#     return alpha*(temperature**2)*kl_loss + (1 - alpha)*ce_loss

In [15]:
# choose the loss function by the config
if cfg['loss_fn_type'] == 'CE':
    # For the classification task, we use cross-entropy as the default loss function.
    loss_fn = nn.CrossEntropyLoss() # loss function for simple baseline.

if cfg['loss_fn_type'] == 'KD': # KD stands for knowledge distillation
    loss_fn = loss_fn_kd # implement loss_fn_kd for the report question and the medium baseline.

# You can also adopt other types of knowledge distillation techniques for strong and boss baseline, but use function name other than `loss_fn_kd`
# For example:
# def loss_fn_custom_kd():
#     pass
# if cfg['loss_fn_type'] == 'custom_kd':
#     loss_fn = loss_fn_custom_kd

# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"
log(f"device: {device}")

# The number of training epochs and patience.
n_epochs = cfg['n_epochs']
patience = cfg['patience'] # If no improvement in 'patience' epochs, early stop

device: cuda


### Training
implement training loop for simple baseline, feel free to modify it.

In [16]:
# Initialize a model, and put it on the device specified.
student_model.to(device)
teacher_model.to(device) # MEDIUM BASELINE

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(student_model.parameters(), lr=cfg['lr'], weight_decay=cfg['weight_decay']) 

# Initialize trackers, these are not parameters and should not be changed
stale = 0
best_acc = 0.0

# teacher_model.eval()  # MEDIUM BASELINE
for epoch in range(n_epochs):

    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    student_model.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []
    train_lens = []
    
    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)
        #imgs = imgs.half()
        #print(imgs.shape,labels.shape)

        # Forward the data. (Make sure data and model are on the same device.)
        with torch.no_grad():  # MEDIUM BASELINE
            teacher_logits = teacher_model(imgs)  # MEDIUM BASELINE
        
        logits = student_model(imgs)

        # Calculate the cross-entropy loss.
        # We don't need to apply softmax before computing cross-entropy as it is done automatically.
        loss = loss_fn(logits, labels, teacher_logits) # MEDIUM BASELINE
#         loss = loss_fn(logits, labels) # SIMPLE BASELINE
        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(student_model.parameters(), max_norm=cfg['grad_norm_max'])

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().sum()

        # Record the loss and accuracy.
        train_batch_len = len(imgs)
        train_loss.append(loss.item() * train_batch_len)
        train_accs.append(acc)
        train_lens.append(train_batch_len)
        
    train_loss = sum(train_loss) / sum(train_lens)
    train_acc = sum(train_accs) / sum(train_lens)

    # Print the information.
    log(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    student_model.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []
    valid_lens = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)
        labels = labels.to(device)

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = student_model(imgs)
            teacher_logits = teacher_model(imgs) # MEDIUM BASELINE

        # We can still compute the loss (but not the gradient).
        loss = loss_fn(logits, labels, teacher_logits) # MEDIUM BASELINE
#         loss = loss_fn(logits, labels) # SIMPLE BASELINE

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels).float().sum()

        # Record the loss and accuracy.
        batch_len = len(imgs)
        valid_loss.append(loss.item() * batch_len)
        valid_accs.append(acc)
        valid_lens.append(batch_len)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / sum(valid_lens)
    valid_acc = sum(valid_accs) / sum(valid_lens)

    # update logs
    
    if valid_acc > best_acc:
        log(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        log(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        log(f"Best model found at epoch {epoch+1}, saving model")
        torch.save(student_model.state_dict(), f"{save_path}/student_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            log(f"No improvment {patience} consecutive epochs, early stopping")
            break
log("Finish training")
log_fw.close()

100%|██████████| 157/157 [00:38<00:00,  4.08it/s]


[ Train | 001/700 ] loss = 11.85965, acc = 0.27209


100%|██████████| 70/70 [00:16<00:00,  4.29it/s]


[ Valid | 001/700 ] loss = 5.14781, acc = 0.31160 -> best
Best model found at epoch 1, saving model


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 002/700 ] loss = 10.95292, acc = 0.33373


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 002/700 ] loss = 5.03112, acc = 0.33845 -> best
Best model found at epoch 2, saving model


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 003/700 ] loss = 10.54689, acc = 0.35905


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 003/700 ] loss = 5.14751, acc = 0.36394 -> best
Best model found at epoch 3, saving model


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 004/700 ] loss = 10.22555, acc = 0.38277


100%|██████████| 70/70 [00:13<00:00,  5.27it/s]


[ Valid | 004/700 ] loss = 5.18786, acc = 0.37477 -> best
Best model found at epoch 4, saving model


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 005/700 ] loss = 10.04481, acc = 0.39578


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 005/700 ] loss = 5.25837, acc = 0.38606 -> best
Best model found at epoch 5, saving model


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 006/700 ] loss = 9.91435, acc = 0.40468


100%|██████████| 70/70 [00:13<00:00,  5.19it/s]


[ Valid | 006/700 ] loss = 5.34212, acc = 0.39486 -> best
Best model found at epoch 6, saving model


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 007/700 ] loss = 9.78740, acc = 0.41629


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 007/700 ] loss = 5.44239, acc = 0.38470


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 008/700 ] loss = 9.65490, acc = 0.42690


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 008/700 ] loss = 5.52766, acc = 0.42757 -> best
Best model found at epoch 8, saving model


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 009/700 ] loss = 9.60146, acc = 0.43290


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 009/700 ] loss = 5.51012, acc = 0.40591


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 010/700 ] loss = 9.48064, acc = 0.43550


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 010/700 ] loss = 5.56252, acc = 0.42757


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 011/700 ] loss = 9.37123, acc = 0.44511


100%|██████████| 70/70 [00:13<00:00,  5.10it/s]


[ Valid | 011/700 ] loss = 5.60873, acc = 0.41336


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 012/700 ] loss = 9.34504, acc = 0.44901


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 012/700 ] loss = 5.51747, acc = 0.44201 -> best
Best model found at epoch 12, saving model


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 013/700 ] loss = 9.28155, acc = 0.45232


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 013/700 ] loss = 5.73765, acc = 0.40794


100%|██████████| 157/157 [00:33<00:00,  4.71it/s]


[ Train | 014/700 ] loss = 9.19385, acc = 0.45862


100%|██████████| 70/70 [00:16<00:00,  4.34it/s]


[ Valid | 014/700 ] loss = 6.12718, acc = 0.41945


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 015/700 ] loss = 9.13228, acc = 0.46402


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 015/700 ] loss = 5.71652, acc = 0.46390 -> best
Best model found at epoch 15, saving model


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 016/700 ] loss = 9.02385, acc = 0.47493


100%|██████████| 70/70 [00:13<00:00,  5.18it/s]


[ Valid | 016/700 ] loss = 5.86164, acc = 0.43953


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 017/700 ] loss = 8.97097, acc = 0.47143


100%|██████████| 70/70 [00:13<00:00,  5.16it/s]


[ Valid | 017/700 ] loss = 5.99054, acc = 0.42825


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 018/700 ] loss = 8.93101, acc = 0.48154


100%|██████████| 70/70 [00:13<00:00,  5.11it/s]


[ Valid | 018/700 ] loss = 5.80976, acc = 0.45623


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 019/700 ] loss = 8.85166, acc = 0.48044


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 019/700 ] loss = 6.70702, acc = 0.39237


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 020/700 ] loss = 8.79221, acc = 0.48364


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 020/700 ] loss = 5.80963, acc = 0.46458 -> best
Best model found at epoch 20, saving model


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 021/700 ] loss = 8.74390, acc = 0.49114


100%|██████████| 70/70 [00:13<00:00,  5.22it/s]


[ Valid | 021/700 ] loss = 6.01438, acc = 0.45578


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 022/700 ] loss = 8.70573, acc = 0.49585


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 022/700 ] loss = 5.98803, acc = 0.48308 -> best
Best model found at epoch 22, saving model


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 023/700 ] loss = 8.64772, acc = 0.50085


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 023/700 ] loss = 5.81451, acc = 0.46525


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 024/700 ] loss = 8.58854, acc = 0.49895


100%|██████████| 70/70 [00:13<00:00,  5.21it/s]


[ Valid | 024/700 ] loss = 5.77233, acc = 0.50361 -> best
Best model found at epoch 24, saving model


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 025/700 ] loss = 8.54521, acc = 0.50926


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 025/700 ] loss = 6.03257, acc = 0.46616


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 026/700 ] loss = 8.48910, acc = 0.50886


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 026/700 ] loss = 6.10998, acc = 0.47315


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 027/700 ] loss = 8.39938, acc = 0.51746


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 027/700 ] loss = 6.25252, acc = 0.48285


100%|██████████| 157/157 [00:33<00:00,  4.71it/s]


[ Train | 028/700 ] loss = 8.39599, acc = 0.51936


100%|██████████| 70/70 [00:14<00:00,  4.70it/s]


[ Valid | 028/700 ] loss = 6.81665, acc = 0.44585


100%|██████████| 157/157 [00:34<00:00,  4.49it/s]


[ Train | 029/700 ] loss = 8.37230, acc = 0.52477


100%|██████████| 70/70 [00:13<00:00,  5.01it/s]


[ Valid | 029/700 ] loss = 6.53447, acc = 0.50384 -> best
Best model found at epoch 29, saving model


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 030/700 ] loss = 8.27213, acc = 0.52357


100%|██████████| 70/70 [00:13<00:00,  5.22it/s]


[ Valid | 030/700 ] loss = 6.11855, acc = 0.51467 -> best
Best model found at epoch 30, saving model


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 031/700 ] loss = 8.25319, acc = 0.52237


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 031/700 ] loss = 6.61090, acc = 0.46593


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 032/700 ] loss = 8.19168, acc = 0.52957


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 032/700 ] loss = 6.54742, acc = 0.45668


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 033/700 ] loss = 8.17828, acc = 0.53147


100%|██████████| 70/70 [00:13<00:00,  5.22it/s]


[ Valid | 033/700 ] loss = 6.21547, acc = 0.52572 -> best
Best model found at epoch 33, saving model


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 034/700 ] loss = 8.11347, acc = 0.53818


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 034/700 ] loss = 6.09901, acc = 0.47653


100%|██████████| 157/157 [00:36<00:00,  4.32it/s]


[ Train | 035/700 ] loss = 8.07213, acc = 0.53778


100%|██████████| 70/70 [00:14<00:00,  4.77it/s]


[ Valid | 035/700 ] loss = 6.49249, acc = 0.50181


100%|██████████| 157/157 [00:35<00:00,  4.49it/s]


[ Train | 036/700 ] loss = 8.02846, acc = 0.54448


100%|██████████| 70/70 [00:15<00:00,  4.56it/s]


[ Valid | 036/700 ] loss = 6.02605, acc = 0.54039 -> best
Best model found at epoch 36, saving model


100%|██████████| 157/157 [00:35<00:00,  4.41it/s]


[ Train | 037/700 ] loss = 8.05056, acc = 0.53938


100%|██████████| 70/70 [00:16<00:00,  4.27it/s]


[ Valid | 037/700 ] loss = 5.96330, acc = 0.52550


100%|██████████| 157/157 [00:36<00:00,  4.32it/s]


[ Train | 038/700 ] loss = 7.97443, acc = 0.54468


100%|██████████| 70/70 [00:16<00:00,  4.24it/s]


[ Valid | 038/700 ] loss = 6.74323, acc = 0.42847


100%|██████████| 157/157 [00:33<00:00,  4.64it/s]


[ Train | 039/700 ] loss = 7.91660, acc = 0.54678


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 039/700 ] loss = 6.59494, acc = 0.50745


100%|██████████| 157/157 [00:33<00:00,  4.70it/s]


[ Train | 040/700 ] loss = 7.93415, acc = 0.54958


100%|██████████| 70/70 [00:14<00:00,  4.68it/s]


[ Valid | 040/700 ] loss = 6.32455, acc = 0.51376


100%|██████████| 157/157 [00:33<00:00,  4.66it/s]


[ Train | 041/700 ] loss = 7.87683, acc = 0.55319


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 041/700 ] loss = 6.28283, acc = 0.52617


100%|██████████| 157/157 [00:34<00:00,  4.59it/s]


[ Train | 042/700 ] loss = 7.85395, acc = 0.55399


100%|██████████| 70/70 [00:14<00:00,  4.82it/s]


[ Valid | 042/700 ] loss = 6.81127, acc = 0.49323


100%|██████████| 157/157 [00:34<00:00,  4.58it/s]


[ Train | 043/700 ] loss = 7.82492, acc = 0.55989


100%|██████████| 70/70 [00:13<00:00,  5.10it/s]


[ Valid | 043/700 ] loss = 6.15632, acc = 0.54310 -> best
Best model found at epoch 43, saving model


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 044/700 ] loss = 7.77965, acc = 0.56149


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 044/700 ] loss = 6.16900, acc = 0.54422 -> best
Best model found at epoch 44, saving model


100%|██████████| 157/157 [00:33<00:00,  4.71it/s]


[ Train | 045/700 ] loss = 7.70904, acc = 0.55969


100%|██████████| 70/70 [00:13<00:00,  5.05it/s]


[ Valid | 045/700 ] loss = 7.20847, acc = 0.44495


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 046/700 ] loss = 7.72062, acc = 0.56359


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 046/700 ] loss = 8.15688, acc = 0.46435


100%|██████████| 157/157 [00:33<00:00,  4.63it/s]


[ Train | 047/700 ] loss = 7.67818, acc = 0.56179


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 047/700 ] loss = 6.51830, acc = 0.51331


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 048/700 ] loss = 7.63339, acc = 0.56610


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 048/700 ] loss = 6.46705, acc = 0.53588


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 049/700 ] loss = 7.60544, acc = 0.57110


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 049/700 ] loss = 6.44225, acc = 0.54129


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 050/700 ] loss = 7.58784, acc = 0.57340


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 050/700 ] loss = 6.41097, acc = 0.55009 -> best
Best model found at epoch 50, saving model


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 051/700 ] loss = 7.54340, acc = 0.57690


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 051/700 ] loss = 6.23920, acc = 0.54513


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 052/700 ] loss = 7.53120, acc = 0.57730


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 052/700 ] loss = 6.35596, acc = 0.55528 -> best
Best model found at epoch 52, saving model


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 053/700 ] loss = 7.53659, acc = 0.57550


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 053/700 ] loss = 6.39272, acc = 0.54874


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 054/700 ] loss = 7.48714, acc = 0.57570


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 054/700 ] loss = 6.52084, acc = 0.54355


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 055/700 ] loss = 7.38809, acc = 0.58101


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 055/700 ] loss = 6.71054, acc = 0.51715


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 056/700 ] loss = 7.37702, acc = 0.58141


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 056/700 ] loss = 6.48556, acc = 0.57875 -> best
Best model found at epoch 56, saving model


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 057/700 ] loss = 7.41209, acc = 0.58651


100%|██████████| 70/70 [00:15<00:00,  4.64it/s]


[ Valid | 057/700 ] loss = 6.78925, acc = 0.56634


100%|██████████| 157/157 [00:33<00:00,  4.71it/s]


[ Train | 058/700 ] loss = 7.35160, acc = 0.58691


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 058/700 ] loss = 7.64955, acc = 0.54219


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 059/700 ] loss = 7.32068, acc = 0.59522


100%|██████████| 70/70 [00:15<00:00,  4.56it/s]


[ Valid | 059/700 ] loss = 6.70176, acc = 0.53678


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 060/700 ] loss = 7.29366, acc = 0.59532


100%|██████████| 70/70 [00:13<00:00,  5.15it/s]


[ Valid | 060/700 ] loss = 6.77588, acc = 0.53023


100%|██████████| 157/157 [00:34<00:00,  4.52it/s]


[ Train | 061/700 ] loss = 7.25115, acc = 0.59171


100%|██████████| 70/70 [00:14<00:00,  4.75it/s]


[ Valid | 061/700 ] loss = 7.76188, acc = 0.49549


100%|██████████| 157/157 [00:35<00:00,  4.38it/s]


[ Train | 062/700 ] loss = 7.24283, acc = 0.59822


100%|██████████| 70/70 [00:13<00:00,  5.12it/s]


[ Valid | 062/700 ] loss = 6.63720, acc = 0.56205


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 063/700 ] loss = 7.21349, acc = 0.59231


100%|██████████| 70/70 [00:13<00:00,  5.24it/s]


[ Valid | 063/700 ] loss = 6.87223, acc = 0.52121


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 064/700 ] loss = 7.18009, acc = 0.60452


100%|██████████| 70/70 [00:13<00:00,  5.27it/s]


[ Valid | 064/700 ] loss = 7.41977, acc = 0.53700


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 065/700 ] loss = 7.17227, acc = 0.60182


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 065/700 ] loss = 6.79302, acc = 0.55731


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 066/700 ] loss = 7.13397, acc = 0.60202


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 066/700 ] loss = 6.86611, acc = 0.56250


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 067/700 ] loss = 7.11445, acc = 0.60012


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 067/700 ] loss = 7.10928, acc = 0.53633


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 068/700 ] loss = 7.11469, acc = 0.60032


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 068/700 ] loss = 6.61286, acc = 0.56882


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 069/700 ] loss = 7.10358, acc = 0.59962


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 069/700 ] loss = 7.46392, acc = 0.50880


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 070/700 ] loss = 7.05448, acc = 0.60502


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 070/700 ] loss = 6.36693, acc = 0.58348 -> best
Best model found at epoch 70, saving model


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 071/700 ] loss = 7.03303, acc = 0.60552


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 071/700 ] loss = 6.74856, acc = 0.59747 -> best
Best model found at epoch 71, saving model


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 072/700 ] loss = 7.03719, acc = 0.61303


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 072/700 ] loss = 6.76433, acc = 0.56611


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 073/700 ] loss = 6.98592, acc = 0.60963


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 073/700 ] loss = 7.11187, acc = 0.55190


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 074/700 ] loss = 6.98894, acc = 0.60592


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 074/700 ] loss = 7.29348, acc = 0.54648


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 075/700 ] loss = 6.97371, acc = 0.60863


100%|██████████| 70/70 [00:13<00:00,  5.04it/s]


[ Valid | 075/700 ] loss = 6.60554, acc = 0.58551


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 076/700 ] loss = 6.91259, acc = 0.61783


100%|██████████| 70/70 [00:13<00:00,  5.27it/s]


[ Valid | 076/700 ] loss = 6.66454, acc = 0.57829


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 077/700 ] loss = 6.93731, acc = 0.61353


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 077/700 ] loss = 7.12070, acc = 0.58439


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 078/700 ] loss = 6.89025, acc = 0.61703


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 078/700 ] loss = 7.11992, acc = 0.56137


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 079/700 ] loss = 6.88647, acc = 0.62264


100%|██████████| 70/70 [00:13<00:00,  5.21it/s]


[ Valid | 079/700 ] loss = 6.70206, acc = 0.59161


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 080/700 ] loss = 6.87111, acc = 0.61863


100%|██████████| 70/70 [00:13<00:00,  5.18it/s]


[ Valid | 080/700 ] loss = 7.41609, acc = 0.53678


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 081/700 ] loss = 6.82487, acc = 0.61993


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 081/700 ] loss = 6.77615, acc = 0.58416


100%|██████████| 157/157 [00:36<00:00,  4.30it/s]


[ Train | 082/700 ] loss = 6.80812, acc = 0.62574


100%|██████████| 70/70 [00:17<00:00,  4.05it/s]


[ Valid | 082/700 ] loss = 6.82289, acc = 0.60266 -> best
Best model found at epoch 82, saving model


100%|██████████| 157/157 [00:36<00:00,  4.30it/s]


[ Train | 083/700 ] loss = 6.76859, acc = 0.62864


100%|██████████| 70/70 [00:17<00:00,  4.12it/s]


[ Valid | 083/700 ] loss = 7.52082, acc = 0.50654


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 084/700 ] loss = 6.79854, acc = 0.62874


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 084/700 ] loss = 6.64179, acc = 0.59860


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 085/700 ] loss = 6.73901, acc = 0.62914


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 085/700 ] loss = 6.80407, acc = 0.57897


100%|██████████| 157/157 [00:35<00:00,  4.45it/s]


[ Train | 086/700 ] loss = 6.73853, acc = 0.62774


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 086/700 ] loss = 6.85393, acc = 0.60018


100%|██████████| 157/157 [00:34<00:00,  4.58it/s]


[ Train | 087/700 ] loss = 6.74252, acc = 0.62794


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 087/700 ] loss = 6.79125, acc = 0.57987


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 088/700 ] loss = 6.68762, acc = 0.62994


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 088/700 ] loss = 6.66250, acc = 0.60514 -> best
Best model found at epoch 88, saving model


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 089/700 ] loss = 6.69966, acc = 0.63324


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 089/700 ] loss = 7.04205, acc = 0.59589


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 090/700 ] loss = 6.64959, acc = 0.62994


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 090/700 ] loss = 7.49647, acc = 0.55054


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 091/700 ] loss = 6.65223, acc = 0.63504


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 091/700 ] loss = 9.99604, acc = 0.48714


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 092/700 ] loss = 6.64439, acc = 0.63444


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 092/700 ] loss = 7.79295, acc = 0.54219


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 093/700 ] loss = 6.63838, acc = 0.63424


100%|██████████| 70/70 [00:13<00:00,  5.10it/s]


[ Valid | 093/700 ] loss = 6.55230, acc = 0.61868 -> best
Best model found at epoch 93, saving model


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 094/700 ] loss = 6.62013, acc = 0.63735


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 094/700 ] loss = 6.86477, acc = 0.61755


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 095/700 ] loss = 6.59694, acc = 0.63434


100%|██████████| 70/70 [00:15<00:00,  4.62it/s]


[ Valid | 095/700 ] loss = 7.13305, acc = 0.56340


100%|██████████| 157/157 [00:34<00:00,  4.54it/s]


[ Train | 096/700 ] loss = 6.55521, acc = 0.63745


100%|██████████| 70/70 [00:14<00:00,  4.98it/s]


[ Valid | 096/700 ] loss = 7.52994, acc = 0.54603


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 097/700 ] loss = 6.57447, acc = 0.64285


100%|██████████| 70/70 [00:14<00:00,  5.00it/s]


[ Valid | 097/700 ] loss = 6.34306, acc = 0.62387 -> best
Best model found at epoch 97, saving model


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 098/700 ] loss = 6.53346, acc = 0.64185


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 098/700 ] loss = 7.16369, acc = 0.56995


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 099/700 ] loss = 6.51450, acc = 0.64265


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 099/700 ] loss = 6.62441, acc = 0.63448 -> best
Best model found at epoch 99, saving model


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 100/700 ] loss = 6.51280, acc = 0.64675


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 100/700 ] loss = 6.55098, acc = 0.61214


100%|██████████| 157/157 [00:31<00:00,  5.04it/s]


[ Train | 101/700 ] loss = 6.50569, acc = 0.64705


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 101/700 ] loss = 6.46175, acc = 0.62635


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 102/700 ] loss = 6.45794, acc = 0.64895


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 102/700 ] loss = 6.88135, acc = 0.62996


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 103/700 ] loss = 6.47644, acc = 0.64985


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 103/700 ] loss = 7.22890, acc = 0.59973


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 104/700 ] loss = 6.47432, acc = 0.64955


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 104/700 ] loss = 6.89006, acc = 0.63831 -> best
Best model found at epoch 104, saving model


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 105/700 ] loss = 6.45309, acc = 0.65006


100%|██████████| 70/70 [00:13<00:00,  5.11it/s]


[ Valid | 105/700 ] loss = 7.64606, acc = 0.55212


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 106/700 ] loss = 6.44006, acc = 0.65096


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 106/700 ] loss = 6.78712, acc = 0.63019


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 107/700 ] loss = 6.42081, acc = 0.64655


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 107/700 ] loss = 6.79811, acc = 0.63222


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 108/700 ] loss = 6.40522, acc = 0.65146


100%|██████████| 70/70 [00:13<00:00,  5.15it/s]


[ Valid | 108/700 ] loss = 6.88856, acc = 0.57784


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 109/700 ] loss = 6.37625, acc = 0.65226


100%|██████████| 70/70 [00:13<00:00,  5.18it/s]


[ Valid | 109/700 ] loss = 7.48183, acc = 0.59770


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 110/700 ] loss = 6.35705, acc = 0.66056


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 110/700 ] loss = 7.92188, acc = 0.51151


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 111/700 ] loss = 6.35413, acc = 0.65386


100%|██████████| 70/70 [00:13<00:00,  5.15it/s]


[ Valid | 111/700 ] loss = 7.10165, acc = 0.63245


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 112/700 ] loss = 6.36415, acc = 0.65606


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 112/700 ] loss = 8.53903, acc = 0.57198


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 113/700 ] loss = 6.33680, acc = 0.65926


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 113/700 ] loss = 6.96361, acc = 0.62455


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 114/700 ] loss = 6.31293, acc = 0.65646


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 114/700 ] loss = 7.72937, acc = 0.60131


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 115/700 ] loss = 6.28646, acc = 0.65836


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 115/700 ] loss = 7.30853, acc = 0.64102 -> best
Best model found at epoch 115, saving model


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 116/700 ] loss = 6.28825, acc = 0.66036


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 116/700 ] loss = 7.77155, acc = 0.53272


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 117/700 ] loss = 6.25225, acc = 0.65196


100%|██████████| 70/70 [00:13<00:00,  5.02it/s]


[ Valid | 117/700 ] loss = 7.00689, acc = 0.60605


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 118/700 ] loss = 6.26576, acc = 0.66066


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 118/700 ] loss = 7.01107, acc = 0.62748


100%|██████████| 157/157 [00:34<00:00,  4.57it/s]


[ Train | 119/700 ] loss = 6.26886, acc = 0.65896


100%|██████████| 70/70 [00:13<00:00,  5.02it/s]


[ Valid | 119/700 ] loss = 6.96190, acc = 0.62229


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 120/700 ] loss = 6.24624, acc = 0.66086


100%|██████████| 70/70 [00:12<00:00,  5.52it/s]


[ Valid | 120/700 ] loss = 7.28387, acc = 0.63899


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 121/700 ] loss = 6.19767, acc = 0.66447


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 121/700 ] loss = 6.93787, acc = 0.64102


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 122/700 ] loss = 6.22991, acc = 0.66597


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 122/700 ] loss = 6.90283, acc = 0.62116


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 123/700 ] loss = 6.21796, acc = 0.66587


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 123/700 ] loss = 7.20543, acc = 0.62184


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 124/700 ] loss = 6.22316, acc = 0.66857


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 124/700 ] loss = 7.04572, acc = 0.62071


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 125/700 ] loss = 6.18981, acc = 0.66527


100%|██████████| 70/70 [00:13<00:00,  5.12it/s]


[ Valid | 125/700 ] loss = 7.12884, acc = 0.62274


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 126/700 ] loss = 6.15080, acc = 0.66917


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 126/700 ] loss = 7.43555, acc = 0.58845


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 127/700 ] loss = 6.16909, acc = 0.66396


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 127/700 ] loss = 7.32285, acc = 0.60785


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 128/700 ] loss = 6.14610, acc = 0.66747


100%|██████████| 70/70 [00:13<00:00,  5.02it/s]


[ Valid | 128/700 ] loss = 7.13182, acc = 0.62365


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 129/700 ] loss = 6.13254, acc = 0.67197


100%|██████████| 70/70 [00:13<00:00,  5.19it/s]


[ Valid | 129/700 ] loss = 6.92203, acc = 0.63673


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 130/700 ] loss = 6.10921, acc = 0.66947


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 130/700 ] loss = 7.36289, acc = 0.55190


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 131/700 ] loss = 6.12365, acc = 0.66797


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 131/700 ] loss = 7.35855, acc = 0.60605


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 132/700 ] loss = 6.07605, acc = 0.67177


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 132/700 ] loss = 7.66154, acc = 0.58958


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 133/700 ] loss = 6.10512, acc = 0.67087


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 133/700 ] loss = 7.04347, acc = 0.61440


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 134/700 ] loss = 6.10498, acc = 0.67297


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 134/700 ] loss = 7.38402, acc = 0.63583


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 135/700 ] loss = 6.10009, acc = 0.67487


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 135/700 ] loss = 6.91536, acc = 0.65095 -> best
Best model found at epoch 135, saving model


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 136/700 ] loss = 6.08277, acc = 0.67487


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 136/700 ] loss = 7.07934, acc = 0.60379


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 137/700 ] loss = 6.05980, acc = 0.67497


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 137/700 ] loss = 7.49994, acc = 0.59522


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 138/700 ] loss = 6.08096, acc = 0.67898


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 138/700 ] loss = 7.34486, acc = 0.63696


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 139/700 ] loss = 6.03896, acc = 0.67888


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 139/700 ] loss = 7.34593, acc = 0.59770


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 140/700 ] loss = 6.02496, acc = 0.67988


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 140/700 ] loss = 8.42088, acc = 0.53542


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 141/700 ] loss = 6.00782, acc = 0.67677


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 141/700 ] loss = 7.92191, acc = 0.58100


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 142/700 ] loss = 6.02990, acc = 0.68078


100%|██████████| 70/70 [00:13<00:00,  5.12it/s]


[ Valid | 142/700 ] loss = 6.64905, acc = 0.64598


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 143/700 ] loss = 5.99259, acc = 0.67978


100%|██████████| 70/70 [00:13<00:00,  5.00it/s]


[ Valid | 143/700 ] loss = 6.85392, acc = 0.65478 -> best
Best model found at epoch 143, saving model


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 144/700 ] loss = 5.97646, acc = 0.68198


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 144/700 ] loss = 7.27009, acc = 0.62635


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 145/700 ] loss = 5.99363, acc = 0.68018


100%|██████████| 70/70 [00:14<00:00,  4.99it/s]


[ Valid | 145/700 ] loss = 6.84060, acc = 0.62974


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 146/700 ] loss = 5.98295, acc = 0.68268


100%|██████████| 70/70 [00:14<00:00,  5.00it/s]


[ Valid | 146/700 ] loss = 6.95900, acc = 0.65411


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 147/700 ] loss = 5.96811, acc = 0.68138


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 147/700 ] loss = 8.43854, acc = 0.55934


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 148/700 ] loss = 5.93352, acc = 0.67817


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 148/700 ] loss = 7.49279, acc = 0.58935


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 149/700 ] loss = 5.95366, acc = 0.68238


100%|██████████| 70/70 [00:13<00:00,  5.09it/s]


[ Valid | 149/700 ] loss = 7.56319, acc = 0.61710


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 150/700 ] loss = 5.90324, acc = 0.68448


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 150/700 ] loss = 7.26785, acc = 0.64779


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 151/700 ] loss = 5.89730, acc = 0.68538


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 151/700 ] loss = 7.75354, acc = 0.58416


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 152/700 ] loss = 5.91000, acc = 0.68628


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 152/700 ] loss = 7.08578, acc = 0.62568


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 153/700 ] loss = 5.92700, acc = 0.68478


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 153/700 ] loss = 7.01750, acc = 0.65140


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 154/700 ] loss = 5.91838, acc = 0.68578


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 154/700 ] loss = 7.36226, acc = 0.65952 -> best
Best model found at epoch 154, saving model


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 155/700 ] loss = 5.85497, acc = 0.68988


100%|██████████| 70/70 [00:13<00:00,  5.06it/s]


[ Valid | 155/700 ] loss = 7.49132, acc = 0.60153


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 156/700 ] loss = 5.84960, acc = 0.69068


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 156/700 ] loss = 7.65863, acc = 0.59634


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 157/700 ] loss = 5.89851, acc = 0.68758


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 157/700 ] loss = 7.09972, acc = 0.65253


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 158/700 ] loss = 5.85351, acc = 0.68828


100%|██████████| 70/70 [00:14<00:00,  4.94it/s]


[ Valid | 158/700 ] loss = 7.64915, acc = 0.66110 -> best
Best model found at epoch 158, saving model


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 159/700 ] loss = 5.84258, acc = 0.68798


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 159/700 ] loss = 7.09272, acc = 0.64373


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 160/700 ] loss = 5.83223, acc = 0.69168


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 160/700 ] loss = 7.66301, acc = 0.66245 -> best
Best model found at epoch 160, saving model


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 161/700 ] loss = 5.84473, acc = 0.68838


100%|██████████| 70/70 [00:13<00:00,  5.06it/s]


[ Valid | 161/700 ] loss = 6.93095, acc = 0.63786


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 162/700 ] loss = 5.82163, acc = 0.69519


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 162/700 ] loss = 6.62529, acc = 0.67261 -> best
Best model found at epoch 162, saving model


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 163/700 ] loss = 5.82971, acc = 0.68928


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 163/700 ] loss = 6.96321, acc = 0.62523


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 164/700 ] loss = 5.81282, acc = 0.68938


100%|██████████| 70/70 [00:13<00:00,  5.09it/s]


[ Valid | 164/700 ] loss = 7.19815, acc = 0.66313


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 165/700 ] loss = 5.77342, acc = 0.69839


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 165/700 ] loss = 7.45400, acc = 0.60514


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 166/700 ] loss = 5.78281, acc = 0.69028


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 166/700 ] loss = 7.04609, acc = 0.61868


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 167/700 ] loss = 5.77469, acc = 0.69479


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 167/700 ] loss = 7.74803, acc = 0.64215


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 168/700 ] loss = 5.74915, acc = 0.69499


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 168/700 ] loss = 7.10101, acc = 0.67035


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 169/700 ] loss = 5.77466, acc = 0.69278


100%|██████████| 70/70 [00:13<00:00,  5.07it/s]


[ Valid | 169/700 ] loss = 7.64908, acc = 0.64328


100%|██████████| 157/157 [00:36<00:00,  4.35it/s]


[ Train | 170/700 ] loss = 5.78169, acc = 0.69719


100%|██████████| 70/70 [00:16<00:00,  4.33it/s]


[ Valid | 170/700 ] loss = 7.98062, acc = 0.60041


100%|██████████| 157/157 [00:36<00:00,  4.35it/s]


[ Train | 171/700 ] loss = 5.73957, acc = 0.69739


100%|██████████| 70/70 [00:14<00:00,  4.68it/s]


[ Valid | 171/700 ] loss = 7.27589, acc = 0.65997


100%|██████████| 157/157 [00:35<00:00,  4.41it/s]


[ Train | 172/700 ] loss = 5.73393, acc = 0.70059


100%|██████████| 70/70 [00:15<00:00,  4.40it/s]


[ Valid | 172/700 ] loss = 7.73326, acc = 0.65456


100%|██████████| 157/157 [00:37<00:00,  4.15it/s]


[ Train | 173/700 ] loss = 5.76065, acc = 0.69489


100%|██████████| 70/70 [00:15<00:00,  4.50it/s]


[ Valid | 173/700 ] loss = 7.14528, acc = 0.65208


100%|██████████| 157/157 [00:36<00:00,  4.35it/s]


[ Train | 174/700 ] loss = 5.71063, acc = 0.69709


100%|██████████| 70/70 [00:14<00:00,  4.83it/s]


[ Valid | 174/700 ] loss = 8.43222, acc = 0.58213


100%|██████████| 157/157 [00:36<00:00,  4.32it/s]


[ Train | 175/700 ] loss = 5.72885, acc = 0.70349


100%|██████████| 70/70 [00:14<00:00,  4.90it/s]


[ Valid | 175/700 ] loss = 7.17076, acc = 0.65907


100%|██████████| 157/157 [00:33<00:00,  4.62it/s]


[ Train | 176/700 ] loss = 5.72615, acc = 0.69979


100%|██████████| 70/70 [00:13<00:00,  5.03it/s]


[ Valid | 176/700 ] loss = 7.13651, acc = 0.63876


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 177/700 ] loss = 5.70977, acc = 0.69709


100%|██████████| 70/70 [00:13<00:00,  5.03it/s]


[ Valid | 177/700 ] loss = 7.02427, acc = 0.65704


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 178/700 ] loss = 5.72563, acc = 0.70059


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 178/700 ] loss = 7.00503, acc = 0.64644


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 179/700 ] loss = 5.68568, acc = 0.70189


100%|██████████| 70/70 [00:13<00:00,  5.27it/s]


[ Valid | 179/700 ] loss = 7.48579, acc = 0.65208


100%|██████████| 157/157 [00:34<00:00,  4.50it/s]


[ Train | 180/700 ] loss = 5.69148, acc = 0.69979


100%|██████████| 70/70 [00:19<00:00,  3.51it/s]


[ Valid | 180/700 ] loss = 7.45007, acc = 0.59815


100%|██████████| 157/157 [00:43<00:00,  3.61it/s]


[ Train | 181/700 ] loss = 5.66520, acc = 0.70659


100%|██████████| 70/70 [00:18<00:00,  3.84it/s]


[ Valid | 181/700 ] loss = 7.35730, acc = 0.64034


100%|██████████| 157/157 [00:37<00:00,  4.16it/s]


[ Train | 182/700 ] loss = 5.67890, acc = 0.70169


100%|██████████| 70/70 [00:15<00:00,  4.54it/s]


[ Valid | 182/700 ] loss = 7.60721, acc = 0.62410


100%|██████████| 157/157 [00:38<00:00,  4.07it/s]


[ Train | 183/700 ] loss = 5.67643, acc = 0.70449


100%|██████████| 70/70 [00:16<00:00,  4.37it/s]


[ Valid | 183/700 ] loss = 7.66598, acc = 0.60875


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 184/700 ] loss = 5.65269, acc = 0.70139


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 184/700 ] loss = 7.39453, acc = 0.64215


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 185/700 ] loss = 5.67683, acc = 0.70589


100%|██████████| 70/70 [00:13<00:00,  5.10it/s]


[ Valid | 185/700 ] loss = 6.97064, acc = 0.68524 -> best
Best model found at epoch 185, saving model


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 186/700 ] loss = 5.61074, acc = 0.70790


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 186/700 ] loss = 8.46297, acc = 0.62432


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 187/700 ] loss = 5.64065, acc = 0.70019


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 187/700 ] loss = 7.34779, acc = 0.66336


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 188/700 ] loss = 5.60954, acc = 0.70369


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 188/700 ] loss = 7.13314, acc = 0.66877


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 189/700 ] loss = 5.62361, acc = 0.70730


100%|██████████| 70/70 [00:13<00:00,  5.01it/s]


[ Valid | 189/700 ] loss = 7.32189, acc = 0.64869


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 190/700 ] loss = 5.58677, acc = 0.70599


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 190/700 ] loss = 7.45843, acc = 0.64057


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 191/700 ] loss = 5.65064, acc = 0.69959


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 191/700 ] loss = 6.94137, acc = 0.66200


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 192/700 ] loss = 5.58328, acc = 0.71100


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 192/700 ] loss = 7.88536, acc = 0.60469


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 193/700 ] loss = 5.58723, acc = 0.70950


100%|██████████| 70/70 [00:14<00:00,  4.75it/s]


[ Valid | 193/700 ] loss = 7.24783, acc = 0.65817


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 194/700 ] loss = 5.57230, acc = 0.71190


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 194/700 ] loss = 7.25816, acc = 0.62274


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 195/700 ] loss = 5.56516, acc = 0.70589


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 195/700 ] loss = 7.43144, acc = 0.65884


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 196/700 ] loss = 5.57713, acc = 0.70669


100%|██████████| 70/70 [00:14<00:00,  4.98it/s]


[ Valid | 196/700 ] loss = 7.62037, acc = 0.62049


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 197/700 ] loss = 5.56985, acc = 0.71100


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 197/700 ] loss = 7.05951, acc = 0.64711


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 198/700 ] loss = 5.55517, acc = 0.71100


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 198/700 ] loss = 7.76303, acc = 0.66268


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 199/700 ] loss = 5.52795, acc = 0.70750


100%|██████████| 70/70 [00:14<00:00,  4.96it/s]


[ Valid | 199/700 ] loss = 8.08892, acc = 0.60966


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 200/700 ] loss = 5.55126, acc = 0.70850


100%|██████████| 70/70 [00:13<00:00,  5.01it/s]


[ Valid | 200/700 ] loss = 7.15514, acc = 0.66990


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 201/700 ] loss = 5.55362, acc = 0.70770


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 201/700 ] loss = 8.32944, acc = 0.56972


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 202/700 ] loss = 5.53580, acc = 0.70960


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 202/700 ] loss = 7.27031, acc = 0.67532


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 203/700 ] loss = 5.51937, acc = 0.71360


100%|██████████| 70/70 [00:13<00:00,  5.07it/s]


[ Valid | 203/700 ] loss = 7.37985, acc = 0.64215


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 204/700 ] loss = 5.52874, acc = 0.71090


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 204/700 ] loss = 6.97661, acc = 0.67509


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 205/700 ] loss = 5.49396, acc = 0.71240


100%|██████████| 70/70 [00:14<00:00,  4.90it/s]


[ Valid | 205/700 ] loss = 8.03379, acc = 0.59973


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 206/700 ] loss = 5.49109, acc = 0.71550


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 206/700 ] loss = 7.24877, acc = 0.67509


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 207/700 ] loss = 5.50365, acc = 0.71410


100%|██████████| 70/70 [00:18<00:00,  3.71it/s]


[ Valid | 207/700 ] loss = 6.87996, acc = 0.69585 -> best
Best model found at epoch 207, saving model


100%|██████████| 157/157 [00:39<00:00,  4.02it/s]


[ Train | 208/700 ] loss = 5.50870, acc = 0.70920


100%|██████████| 70/70 [00:16<00:00,  4.24it/s]


[ Valid | 208/700 ] loss = 7.23183, acc = 0.69111


100%|██████████| 157/157 [00:35<00:00,  4.46it/s]


[ Train | 209/700 ] loss = 5.47108, acc = 0.71660


100%|██████████| 70/70 [00:15<00:00,  4.47it/s]


[ Valid | 209/700 ] loss = 7.69474, acc = 0.63087


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 210/700 ] loss = 5.48672, acc = 0.71470


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 210/700 ] loss = 6.94100, acc = 0.67757


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 211/700 ] loss = 5.47561, acc = 0.71660


100%|██████████| 70/70 [00:14<00:00,  4.85it/s]


[ Valid | 211/700 ] loss = 7.24380, acc = 0.66133


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 212/700 ] loss = 5.47870, acc = 0.71130


100%|██████████| 70/70 [00:13<00:00,  5.11it/s]


[ Valid | 212/700 ] loss = 7.37580, acc = 0.66471


100%|██████████| 157/157 [00:33<00:00,  4.76it/s]


[ Train | 213/700 ] loss = 5.46011, acc = 0.71760


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 213/700 ] loss = 7.02764, acc = 0.67058


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 214/700 ] loss = 5.49918, acc = 0.71370


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 214/700 ] loss = 7.40810, acc = 0.62568


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 215/700 ] loss = 5.43712, acc = 0.71720


100%|██████████| 70/70 [00:13<00:00,  5.00it/s]


[ Valid | 215/700 ] loss = 8.32635, acc = 0.61665


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 216/700 ] loss = 5.44619, acc = 0.71940


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 216/700 ] loss = 7.32340, acc = 0.66403


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 217/700 ] loss = 5.40864, acc = 0.71960


100%|██████████| 70/70 [00:14<00:00,  4.93it/s]


[ Valid | 217/700 ] loss = 7.68744, acc = 0.62884


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 218/700 ] loss = 5.44752, acc = 0.71700


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 218/700 ] loss = 7.31361, acc = 0.64598


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 219/700 ] loss = 5.40712, acc = 0.71990


100%|██████████| 70/70 [00:13<00:00,  5.01it/s]


[ Valid | 219/700 ] loss = 7.65097, acc = 0.65366


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 220/700 ] loss = 5.40810, acc = 0.72020


100%|██████████| 70/70 [00:14<00:00,  4.85it/s]


[ Valid | 220/700 ] loss = 7.49381, acc = 0.66358


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 221/700 ] loss = 5.40881, acc = 0.72000


100%|██████████| 70/70 [00:14<00:00,  5.00it/s]


[ Valid | 221/700 ] loss = 7.22246, acc = 0.66697


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 222/700 ] loss = 5.42746, acc = 0.72201


100%|██████████| 70/70 [00:13<00:00,  5.02it/s]


[ Valid | 222/700 ] loss = 7.40950, acc = 0.67125


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 223/700 ] loss = 5.40592, acc = 0.72080


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 223/700 ] loss = 7.54952, acc = 0.63876


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 224/700 ] loss = 5.38760, acc = 0.72201


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 224/700 ] loss = 7.34637, acc = 0.65704


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 225/700 ] loss = 5.38804, acc = 0.71870


100%|██████████| 70/70 [00:14<00:00,  4.76it/s]


[ Valid | 225/700 ] loss = 7.75422, acc = 0.63764


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 226/700 ] loss = 5.38697, acc = 0.72201


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 226/700 ] loss = 6.97771, acc = 0.66922


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 227/700 ] loss = 5.42609, acc = 0.71670


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 227/700 ] loss = 7.14859, acc = 0.67915


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 228/700 ] loss = 5.37693, acc = 0.71990


100%|██████████| 70/70 [00:13<00:00,  5.22it/s]


[ Valid | 228/700 ] loss = 7.71042, acc = 0.63087


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 229/700 ] loss = 5.36206, acc = 0.72701


100%|██████████| 70/70 [00:14<00:00,  4.83it/s]


[ Valid | 229/700 ] loss = 7.03992, acc = 0.68795


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 230/700 ] loss = 5.36906, acc = 0.71890


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 230/700 ] loss = 6.89190, acc = 0.68931


100%|██████████| 157/157 [00:34<00:00,  4.49it/s]


[ Train | 231/700 ] loss = 5.38039, acc = 0.71990


100%|██████████| 70/70 [00:13<00:00,  5.24it/s]


[ Valid | 231/700 ] loss = 7.27467, acc = 0.66471


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 232/700 ] loss = 5.37729, acc = 0.72311


100%|██████████| 70/70 [00:14<00:00,  4.89it/s]


[ Valid | 232/700 ] loss = 7.25716, acc = 0.67554


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 233/700 ] loss = 5.35722, acc = 0.72000


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 233/700 ] loss = 7.37335, acc = 0.67171


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 234/700 ] loss = 5.33569, acc = 0.72621


100%|██████████| 70/70 [00:13<00:00,  5.24it/s]


[ Valid | 234/700 ] loss = 8.36765, acc = 0.60605


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 235/700 ] loss = 5.32937, acc = 0.72331


100%|██████████| 70/70 [00:13<00:00,  5.24it/s]


[ Valid | 235/700 ] loss = 7.34408, acc = 0.68299


100%|██████████| 157/157 [00:34<00:00,  4.61it/s]


[ Train | 236/700 ] loss = 5.29976, acc = 0.72981


100%|██████████| 70/70 [00:17<00:00,  4.09it/s]


[ Valid | 236/700 ] loss = 7.05712, acc = 0.67757


100%|██████████| 157/157 [00:34<00:00,  4.56it/s]


[ Train | 237/700 ] loss = 5.32387, acc = 0.72070


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 237/700 ] loss = 8.02505, acc = 0.66945


100%|██████████| 157/157 [00:34<00:00,  4.50it/s]


[ Train | 238/700 ] loss = 5.31948, acc = 0.72531


100%|██████████| 70/70 [00:14<00:00,  4.73it/s]


[ Valid | 238/700 ] loss = 7.42878, acc = 0.63651


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 239/700 ] loss = 5.30946, acc = 0.72591


100%|██████████| 70/70 [00:13<00:00,  5.16it/s]


[ Valid | 239/700 ] loss = 7.71586, acc = 0.66922


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 240/700 ] loss = 5.32140, acc = 0.72231


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 240/700 ] loss = 8.00738, acc = 0.65704


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 241/700 ] loss = 5.33595, acc = 0.72761


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 241/700 ] loss = 7.77865, acc = 0.62319


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 242/700 ] loss = 5.32605, acc = 0.72541


100%|██████████| 70/70 [00:14<00:00,  4.73it/s]


[ Valid | 242/700 ] loss = 7.96352, acc = 0.64373


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 243/700 ] loss = 5.31379, acc = 0.72471


100%|██████████| 70/70 [00:13<00:00,  5.09it/s]


[ Valid | 243/700 ] loss = 8.17848, acc = 0.64508


100%|██████████| 157/157 [00:35<00:00,  4.45it/s]


[ Train | 244/700 ] loss = 5.28519, acc = 0.72771


100%|██████████| 70/70 [00:13<00:00,  5.11it/s]


[ Valid | 244/700 ] loss = 8.33873, acc = 0.61597


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 245/700 ] loss = 5.30204, acc = 0.73061


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 245/700 ] loss = 7.78594, acc = 0.65749


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 246/700 ] loss = 5.27873, acc = 0.72681


100%|██████████| 70/70 [00:14<00:00,  4.78it/s]


[ Valid | 246/700 ] loss = 7.35433, acc = 0.69269


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 247/700 ] loss = 5.27149, acc = 0.73001


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 247/700 ] loss = 7.55024, acc = 0.67419


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 248/700 ] loss = 5.25207, acc = 0.72681


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 248/700 ] loss = 7.00762, acc = 0.68840


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 249/700 ] loss = 5.24294, acc = 0.73021


100%|██████████| 70/70 [00:13<00:00,  5.07it/s]


[ Valid | 249/700 ] loss = 7.64478, acc = 0.68118


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 250/700 ] loss = 5.26319, acc = 0.72891


100%|██████████| 70/70 [00:14<00:00,  4.96it/s]


[ Valid | 250/700 ] loss = 7.77816, acc = 0.64914


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 251/700 ] loss = 5.26691, acc = 0.72781


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 251/700 ] loss = 6.99271, acc = 0.70036 -> best
Best model found at epoch 251, saving model


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 252/700 ] loss = 5.26900, acc = 0.73141


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 252/700 ] loss = 8.29821, acc = 0.62455


100%|██████████| 157/157 [00:35<00:00,  4.47it/s]


[ Train | 253/700 ] loss = 5.24065, acc = 0.73041


100%|██████████| 70/70 [00:14<00:00,  4.90it/s]


[ Valid | 253/700 ] loss = 7.51227, acc = 0.68366


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 254/700 ] loss = 5.24592, acc = 0.73491


100%|██████████| 70/70 [00:14<00:00,  4.98it/s]


[ Valid | 254/700 ] loss = 7.20832, acc = 0.67351


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 255/700 ] loss = 5.23588, acc = 0.73031


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 255/700 ] loss = 8.22167, acc = 0.62477


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 256/700 ] loss = 5.24948, acc = 0.72851


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 256/700 ] loss = 7.36247, acc = 0.69901


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 257/700 ] loss = 5.22196, acc = 0.73091


100%|██████████| 70/70 [00:14<00:00,  4.86it/s]


[ Valid | 257/700 ] loss = 7.05394, acc = 0.68231


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 258/700 ] loss = 5.22888, acc = 0.73121


100%|██████████| 70/70 [00:14<00:00,  4.98it/s]


[ Valid | 258/700 ] loss = 7.47739, acc = 0.69472


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 259/700 ] loss = 5.21383, acc = 0.73391


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 259/700 ] loss = 6.75711, acc = 0.69968


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 260/700 ] loss = 5.20070, acc = 0.73431


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 260/700 ] loss = 6.96368, acc = 0.68299


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 261/700 ] loss = 5.18094, acc = 0.73031


100%|██████████| 70/70 [00:13<00:00,  5.04it/s]


[ Valid | 261/700 ] loss = 7.23750, acc = 0.69991


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 262/700 ] loss = 5.17434, acc = 0.73331


100%|██████████| 70/70 [00:14<00:00,  4.86it/s]


[ Valid | 262/700 ] loss = 7.41072, acc = 0.69562


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 263/700 ] loss = 5.23048, acc = 0.73381


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 263/700 ] loss = 7.01451, acc = 0.69765


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 264/700 ] loss = 5.19999, acc = 0.73471


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 264/700 ] loss = 7.39751, acc = 0.64598


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 265/700 ] loss = 5.20084, acc = 0.72991


100%|██████████| 70/70 [00:14<00:00,  4.97it/s]


[ Valid | 265/700 ] loss = 7.15158, acc = 0.66110


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 266/700 ] loss = 5.21523, acc = 0.73431


100%|██████████| 70/70 [00:14<00:00,  4.96it/s]


[ Valid | 266/700 ] loss = 7.22146, acc = 0.68073


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 267/700 ] loss = 5.16682, acc = 0.73772


100%|██████████| 70/70 [00:13<00:00,  5.19it/s]


[ Valid | 267/700 ] loss = 7.64555, acc = 0.69337


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 268/700 ] loss = 5.15656, acc = 0.73561


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 268/700 ] loss = 7.11136, acc = 0.67374


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 269/700 ] loss = 5.17187, acc = 0.73592


100%|██████████| 70/70 [00:14<00:00,  4.71it/s]


[ Valid | 269/700 ] loss = 7.19415, acc = 0.67915


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 270/700 ] loss = 5.16738, acc = 0.73712


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 270/700 ] loss = 7.49611, acc = 0.62929


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 271/700 ] loss = 5.17538, acc = 0.72811


100%|██████████| 70/70 [00:13<00:00,  5.24it/s]


[ Valid | 271/700 ] loss = 7.09209, acc = 0.68276


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 272/700 ] loss = 5.15897, acc = 0.73561


100%|██████████| 70/70 [00:14<00:00,  4.82it/s]


[ Valid | 272/700 ] loss = 7.67252, acc = 0.66832


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 273/700 ] loss = 5.14412, acc = 0.73782


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 273/700 ] loss = 8.06085, acc = 0.62568


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 274/700 ] loss = 5.15154, acc = 0.74002


100%|██████████| 70/70 [00:13<00:00,  5.22it/s]


[ Valid | 274/700 ] loss = 8.69855, acc = 0.57265


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 275/700 ] loss = 5.16350, acc = 0.73281


100%|██████████| 70/70 [00:14<00:00,  4.89it/s]


[ Valid | 275/700 ] loss = 7.37871, acc = 0.68908


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 276/700 ] loss = 5.16202, acc = 0.73582


100%|██████████| 70/70 [00:13<00:00,  5.16it/s]


[ Valid | 276/700 ] loss = 7.03242, acc = 0.67374


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 277/700 ] loss = 5.12583, acc = 0.73391


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 277/700 ] loss = 7.73881, acc = 0.67577


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 278/700 ] loss = 5.11597, acc = 0.74222


100%|██████████| 70/70 [00:14<00:00,  4.86it/s]


[ Valid | 278/700 ] loss = 7.47512, acc = 0.66652


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 279/700 ] loss = 5.12340, acc = 0.73521


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 279/700 ] loss = 7.63329, acc = 0.66471


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 280/700 ] loss = 5.12183, acc = 0.73942


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 280/700 ] loss = 7.16687, acc = 0.66313


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 281/700 ] loss = 5.10296, acc = 0.73792


100%|██████████| 70/70 [00:14<00:00,  4.67it/s]


[ Valid | 281/700 ] loss = 7.77846, acc = 0.64305


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 282/700 ] loss = 5.10138, acc = 0.74002


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 282/700 ] loss = 7.04699, acc = 0.69337


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 283/700 ] loss = 5.10622, acc = 0.74192


100%|██████████| 70/70 [00:13<00:00,  5.27it/s]


[ Valid | 283/700 ] loss = 6.90198, acc = 0.71255 -> best
Best model found at epoch 283, saving model


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 284/700 ] loss = 5.09369, acc = 0.74282


100%|██████████| 70/70 [00:15<00:00,  4.63it/s]


[ Valid | 284/700 ] loss = 8.09174, acc = 0.64215


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 285/700 ] loss = 5.10443, acc = 0.73992


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 285/700 ] loss = 8.20998, acc = 0.63335


100%|██████████| 157/157 [00:34<00:00,  4.59it/s]


[ Train | 286/700 ] loss = 5.08014, acc = 0.74102


100%|██████████| 70/70 [00:14<00:00,  4.98it/s]


[ Valid | 286/700 ] loss = 7.64747, acc = 0.69066


100%|██████████| 157/157 [00:34<00:00,  4.60it/s]


[ Train | 287/700 ] loss = 5.10547, acc = 0.74062


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 287/700 ] loss = 7.52091, acc = 0.68186


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 288/700 ] loss = 5.07668, acc = 0.74152


100%|██████████| 70/70 [00:14<00:00,  4.75it/s]


[ Valid | 288/700 ] loss = 8.36827, acc = 0.64418


100%|██████████| 157/157 [00:34<00:00,  4.56it/s]


[ Train | 289/700 ] loss = 5.09180, acc = 0.74262


100%|██████████| 70/70 [00:14<00:00,  4.79it/s]


[ Valid | 289/700 ] loss = 7.38538, acc = 0.65681


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 290/700 ] loss = 5.08190, acc = 0.73792


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 290/700 ] loss = 8.27109, acc = 0.63786


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 291/700 ] loss = 5.09882, acc = 0.74002


100%|██████████| 70/70 [00:14<00:00,  4.71it/s]


[ Valid | 291/700 ] loss = 7.35503, acc = 0.67667


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 292/700 ] loss = 5.07765, acc = 0.74032


100%|██████████| 70/70 [00:15<00:00,  4.46it/s]


[ Valid | 292/700 ] loss = 8.05637, acc = 0.67374


100%|██████████| 157/157 [00:33<00:00,  4.64it/s]


[ Train | 293/700 ] loss = 5.05339, acc = 0.74212


100%|██████████| 70/70 [00:15<00:00,  4.43it/s]


[ Valid | 293/700 ] loss = 7.78513, acc = 0.68231


100%|██████████| 157/157 [00:36<00:00,  4.33it/s]


[ Train | 294/700 ] loss = 5.06929, acc = 0.74472


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 294/700 ] loss = 7.61636, acc = 0.66652


100%|██████████| 157/157 [00:33<00:00,  4.72it/s]


[ Train | 295/700 ] loss = 5.09523, acc = 0.74082


100%|██████████| 70/70 [00:15<00:00,  4.50it/s]


[ Valid | 295/700 ] loss = 7.81771, acc = 0.65704


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 296/700 ] loss = 5.04414, acc = 0.74532


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 296/700 ] loss = 7.52225, acc = 0.67577


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 297/700 ] loss = 5.05556, acc = 0.74312


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 297/700 ] loss = 8.86180, acc = 0.59860


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 298/700 ] loss = 5.03531, acc = 0.74762


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 298/700 ] loss = 7.70513, acc = 0.62838


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 299/700 ] loss = 5.09557, acc = 0.74262


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 299/700 ] loss = 8.00544, acc = 0.65320


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 300/700 ] loss = 5.04147, acc = 0.74752


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 300/700 ] loss = 7.71420, acc = 0.68976


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 301/700 ] loss = 5.03308, acc = 0.74372


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 301/700 ] loss = 7.09852, acc = 0.69743


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 302/700 ] loss = 5.04198, acc = 0.74482


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 302/700 ] loss = 7.87892, acc = 0.67915


100%|██████████| 157/157 [00:33<00:00,  4.71it/s]


[ Train | 303/700 ] loss = 4.99334, acc = 0.74492


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 303/700 ] loss = 7.48825, acc = 0.66606


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 304/700 ] loss = 5.01733, acc = 0.74452


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 304/700 ] loss = 7.53182, acc = 0.69043


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 305/700 ] loss = 5.04841, acc = 0.74662


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 305/700 ] loss = 7.41504, acc = 0.66358


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 306/700 ] loss = 5.01139, acc = 0.74812


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 306/700 ] loss = 7.44017, acc = 0.68524


100%|██████████| 157/157 [00:31<00:00,  5.04it/s]


[ Train | 307/700 ] loss = 5.00249, acc = 0.74792


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 307/700 ] loss = 7.82434, acc = 0.68208


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 308/700 ] loss = 5.01941, acc = 0.74552


100%|██████████| 70/70 [00:16<00:00,  4.26it/s]


[ Valid | 308/700 ] loss = 7.10023, acc = 0.67306


100%|██████████| 157/157 [00:36<00:00,  4.29it/s]


[ Train | 309/700 ] loss = 5.01167, acc = 0.74582


100%|██████████| 70/70 [00:16<00:00,  4.35it/s]


[ Valid | 309/700 ] loss = 7.87120, acc = 0.66968


100%|██████████| 157/157 [00:37<00:00,  4.18it/s]


[ Train | 310/700 ] loss = 4.98118, acc = 0.74712


100%|██████████| 70/70 [00:14<00:00,  4.73it/s]


[ Valid | 310/700 ] loss = 7.66108, acc = 0.65749


100%|██████████| 157/157 [00:33<00:00,  4.72it/s]


[ Train | 311/700 ] loss = 4.95986, acc = 0.74442


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 311/700 ] loss = 7.73129, acc = 0.71503 -> best
Best model found at epoch 311, saving model


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 312/700 ] loss = 4.99203, acc = 0.75383


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 312/700 ] loss = 7.41441, acc = 0.68073


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 313/700 ] loss = 4.96682, acc = 0.74922


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 313/700 ] loss = 7.69501, acc = 0.64756


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 314/700 ] loss = 4.97598, acc = 0.74502


100%|██████████| 70/70 [00:13<00:00,  5.04it/s]


[ Valid | 314/700 ] loss = 7.55908, acc = 0.64937


100%|██████████| 157/157 [00:33<00:00,  4.63it/s]


[ Train | 315/700 ] loss = 4.95669, acc = 0.75203


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 315/700 ] loss = 7.34015, acc = 0.68953


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 316/700 ] loss = 4.96809, acc = 0.74992


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 316/700 ] loss = 8.19336, acc = 0.64102


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 317/700 ] loss = 4.98614, acc = 0.74872


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 317/700 ] loss = 7.47749, acc = 0.67261


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 318/700 ] loss = 4.96011, acc = 0.75123


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 318/700 ] loss = 7.25527, acc = 0.70307


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 319/700 ] loss = 4.97895, acc = 0.74312


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 319/700 ] loss = 7.27786, acc = 0.70939


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 320/700 ] loss = 5.00472, acc = 0.74942


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 320/700 ] loss = 8.26925, acc = 0.61823


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 321/700 ] loss = 4.97075, acc = 0.74792


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 321/700 ] loss = 7.26119, acc = 0.68299


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 322/700 ] loss = 4.96697, acc = 0.75603


100%|██████████| 70/70 [00:13<00:00,  5.09it/s]


[ Valid | 322/700 ] loss = 7.92848, acc = 0.67193


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 323/700 ] loss = 4.94659, acc = 0.75623


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 323/700 ] loss = 7.39044, acc = 0.70623


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 324/700 ] loss = 4.93105, acc = 0.75263


100%|██████████| 70/70 [00:12<00:00,  5.46it/s]


[ Valid | 324/700 ] loss = 8.44758, acc = 0.61236


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 325/700 ] loss = 4.93010, acc = 0.75253


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 325/700 ] loss = 7.67013, acc = 0.67712


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 326/700 ] loss = 4.94813, acc = 0.74902


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 326/700 ] loss = 7.99737, acc = 0.64463


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 327/700 ] loss = 4.97119, acc = 0.74772


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 327/700 ] loss = 7.07746, acc = 0.70307


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 328/700 ] loss = 4.93855, acc = 0.75073


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 328/700 ] loss = 8.09537, acc = 0.65681


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 329/700 ] loss = 4.94261, acc = 0.75053


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 329/700 ] loss = 7.28318, acc = 0.69246


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 330/700 ] loss = 4.92000, acc = 0.75243


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 330/700 ] loss = 7.27486, acc = 0.69156


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 331/700 ] loss = 4.94135, acc = 0.75123


100%|██████████| 70/70 [00:14<00:00,  4.82it/s]


[ Valid | 331/700 ] loss = 7.28290, acc = 0.69585


100%|██████████| 157/157 [00:31<00:00,  5.04it/s]


[ Train | 332/700 ] loss = 4.93784, acc = 0.75123


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 332/700 ] loss = 7.70645, acc = 0.68457


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 333/700 ] loss = 4.90709, acc = 0.75403


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 333/700 ] loss = 8.25605, acc = 0.64711


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 334/700 ] loss = 4.90624, acc = 0.75683


100%|██████████| 70/70 [00:14<00:00,  4.92it/s]


[ Valid | 334/700 ] loss = 7.84157, acc = 0.62906


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 335/700 ] loss = 4.90008, acc = 0.75203


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 335/700 ] loss = 8.66757, acc = 0.62861


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 336/700 ] loss = 4.89767, acc = 0.75463


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 336/700 ] loss = 7.07329, acc = 0.71367


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 337/700 ] loss = 4.90137, acc = 0.75173


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 337/700 ] loss = 7.11963, acc = 0.71300


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 338/700 ] loss = 4.90774, acc = 0.75763


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 338/700 ] loss = 8.21749, acc = 0.69292


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 339/700 ] loss = 4.89518, acc = 0.75433


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 339/700 ] loss = 7.36006, acc = 0.69698


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 340/700 ] loss = 4.87066, acc = 0.75463


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 340/700 ] loss = 7.55134, acc = 0.67690


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 341/700 ] loss = 4.86019, acc = 0.75383


100%|██████████| 70/70 [00:12<00:00,  5.46it/s]


[ Valid | 341/700 ] loss = 6.98638, acc = 0.68818


100%|██████████| 157/157 [00:31<00:00,  4.96it/s]


[ Train | 342/700 ] loss = 4.87088, acc = 0.75963


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 342/700 ] loss = 8.14840, acc = 0.67058


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 343/700 ] loss = 4.86336, acc = 0.75153


100%|██████████| 70/70 [00:13<00:00,  5.25it/s]


[ Valid | 343/700 ] loss = 7.12650, acc = 0.69292


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 344/700 ] loss = 4.86377, acc = 0.75233


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 344/700 ] loss = 7.08133, acc = 0.70555


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 345/700 ] loss = 4.87661, acc = 0.75693


100%|██████████| 70/70 [00:14<00:00,  4.83it/s]


[ Valid | 345/700 ] loss = 7.29363, acc = 0.64350


100%|██████████| 157/157 [00:30<00:00,  5.22it/s]


[ Train | 346/700 ] loss = 4.85953, acc = 0.75723


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 346/700 ] loss = 7.25284, acc = 0.69314


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 347/700 ] loss = 4.86290, acc = 0.75463


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 347/700 ] loss = 7.36910, acc = 0.67960


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 348/700 ] loss = 4.85740, acc = 0.75763


100%|██████████| 70/70 [00:15<00:00,  4.65it/s]


[ Valid | 348/700 ] loss = 7.67918, acc = 0.69946


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 349/700 ] loss = 4.88723, acc = 0.75383


100%|██████████| 70/70 [00:12<00:00,  5.59it/s]


[ Valid | 349/700 ] loss = 7.83236, acc = 0.67915


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 350/700 ] loss = 4.85049, acc = 0.75683


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 350/700 ] loss = 7.17729, acc = 0.69517


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 351/700 ] loss = 4.83743, acc = 0.75643


100%|██████████| 70/70 [00:14<00:00,  4.85it/s]


[ Valid | 351/700 ] loss = 7.83036, acc = 0.68344


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 352/700 ] loss = 4.84785, acc = 0.76253


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 352/700 ] loss = 7.55986, acc = 0.70600


100%|██████████| 157/157 [00:33<00:00,  4.66it/s]


[ Train | 353/700 ] loss = 4.82048, acc = 0.75953


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 353/700 ] loss = 7.91182, acc = 0.66403


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 354/700 ] loss = 4.82425, acc = 0.76163


100%|██████████| 70/70 [00:13<00:00,  5.29it/s]


[ Valid | 354/700 ] loss = 7.59726, acc = 0.64079


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 355/700 ] loss = 4.83748, acc = 0.75973


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 355/700 ] loss = 7.46330, acc = 0.69134


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 356/700 ] loss = 4.82901, acc = 0.75603


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 356/700 ] loss = 7.44898, acc = 0.68028


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 357/700 ] loss = 4.82255, acc = 0.76103


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 357/700 ] loss = 7.62128, acc = 0.63493


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 358/700 ] loss = 4.84403, acc = 0.75933


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 358/700 ] loss = 7.59492, acc = 0.70081


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 359/700 ] loss = 4.83265, acc = 0.75793


100%|██████████| 70/70 [00:13<00:00,  5.21it/s]


[ Valid | 359/700 ] loss = 7.36011, acc = 0.69201


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 360/700 ] loss = 4.81911, acc = 0.76173


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 360/700 ] loss = 7.15029, acc = 0.69607


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 361/700 ] loss = 4.85201, acc = 0.75843


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 361/700 ] loss = 7.66165, acc = 0.68276


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 362/700 ] loss = 4.78670, acc = 0.76263


100%|██████████| 70/70 [00:14<00:00,  4.89it/s]


[ Valid | 362/700 ] loss = 8.37781, acc = 0.64666


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 363/700 ] loss = 4.82228, acc = 0.75663


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 363/700 ] loss = 6.99176, acc = 0.71006


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 364/700 ] loss = 4.81654, acc = 0.75913


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 364/700 ] loss = 7.80574, acc = 0.67486


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 365/700 ] loss = 4.80782, acc = 0.76093


100%|██████████| 70/70 [00:13<00:00,  5.04it/s]


[ Valid | 365/700 ] loss = 7.32455, acc = 0.66358


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 366/700 ] loss = 4.79705, acc = 0.76514


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 366/700 ] loss = 7.69238, acc = 0.69630


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 367/700 ] loss = 4.79637, acc = 0.76383


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 367/700 ] loss = 8.04619, acc = 0.65636


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 368/700 ] loss = 4.78950, acc = 0.75753


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 368/700 ] loss = 7.58440, acc = 0.68231


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 369/700 ] loss = 4.76906, acc = 0.75993


100%|██████████| 70/70 [00:12<00:00,  5.64it/s]


[ Valid | 369/700 ] loss = 7.04290, acc = 0.70939


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 370/700 ] loss = 4.79110, acc = 0.76113


100%|██████████| 70/70 [00:14<00:00,  4.69it/s]


[ Valid | 370/700 ] loss = 7.95013, acc = 0.68547


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 371/700 ] loss = 4.79008, acc = 0.76323


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 371/700 ] loss = 7.32667, acc = 0.70465


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 372/700 ] loss = 4.76056, acc = 0.76003


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 372/700 ] loss = 7.55764, acc = 0.68863


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 373/700 ] loss = 4.74939, acc = 0.76464


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 373/700 ] loss = 7.53879, acc = 0.70149


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 374/700 ] loss = 4.79400, acc = 0.76333


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 374/700 ] loss = 7.58187, acc = 0.67735


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 375/700 ] loss = 4.77101, acc = 0.76123


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 375/700 ] loss = 8.04464, acc = 0.64440


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 376/700 ] loss = 4.78054, acc = 0.76173


100%|██████████| 70/70 [00:14<00:00,  4.82it/s]


[ Valid | 376/700 ] loss = 7.22755, acc = 0.69856


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 377/700 ] loss = 4.78003, acc = 0.76133


100%|██████████| 70/70 [00:12<00:00,  5.62it/s]


[ Valid | 377/700 ] loss = 8.38688, acc = 0.66178


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 378/700 ] loss = 4.76314, acc = 0.76403


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 378/700 ] loss = 10.50438, acc = 0.50790


100%|██████████| 157/157 [00:29<00:00,  5.24it/s]


[ Train | 379/700 ] loss = 4.76696, acc = 0.75923


100%|██████████| 70/70 [00:13<00:00,  5.03it/s]


[ Valid | 379/700 ] loss = 7.40487, acc = 0.70532


100%|██████████| 157/157 [00:35<00:00,  4.44it/s]


[ Train | 380/700 ] loss = 4.74594, acc = 0.76474


100%|██████████| 70/70 [00:13<00:00,  5.18it/s]


[ Valid | 380/700 ] loss = 8.56736, acc = 0.62116


100%|██████████| 157/157 [00:33<00:00,  4.64it/s]


[ Train | 381/700 ] loss = 4.74905, acc = 0.76123


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 381/700 ] loss = 7.46155, acc = 0.70036


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 382/700 ] loss = 4.74464, acc = 0.76383


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 382/700 ] loss = 7.18685, acc = 0.71029


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 383/700 ] loss = 4.73310, acc = 0.76904


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 383/700 ] loss = 7.01131, acc = 0.70713


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 384/700 ] loss = 4.75393, acc = 0.76634


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 384/700 ] loss = 7.29773, acc = 0.67532


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 385/700 ] loss = 4.74492, acc = 0.76073


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 385/700 ] loss = 8.00629, acc = 0.64305


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 386/700 ] loss = 4.72381, acc = 0.76634


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 386/700 ] loss = 7.71954, acc = 0.69675


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 387/700 ] loss = 4.70189, acc = 0.76413


100%|██████████| 70/70 [00:14<00:00,  4.83it/s]


[ Valid | 387/700 ] loss = 7.17286, acc = 0.67329


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 388/700 ] loss = 4.70268, acc = 0.76804


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 388/700 ] loss = 8.88834, acc = 0.58394


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 389/700 ] loss = 4.71198, acc = 0.76624


100%|██████████| 70/70 [00:12<00:00,  5.59it/s]


[ Valid | 389/700 ] loss = 7.16675, acc = 0.70397


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 390/700 ] loss = 4.71545, acc = 0.76454


100%|██████████| 70/70 [00:14<00:00,  4.86it/s]


[ Valid | 390/700 ] loss = 7.33195, acc = 0.69878


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 391/700 ] loss = 4.70319, acc = 0.76894


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 391/700 ] loss = 7.79353, acc = 0.68637


100%|██████████| 157/157 [00:34<00:00,  4.57it/s]


[ Train | 392/700 ] loss = 4.72577, acc = 0.76333


100%|██████████| 70/70 [00:13<00:00,  5.21it/s]


[ Valid | 392/700 ] loss = 8.15612, acc = 0.63290


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 393/700 ] loss = 4.71703, acc = 0.76684


100%|██████████| 70/70 [00:13<00:00,  5.07it/s]


[ Valid | 393/700 ] loss = 7.04274, acc = 0.71525 -> best
Best model found at epoch 393, saving model


100%|██████████| 157/157 [00:33<00:00,  4.70it/s]


[ Train | 394/700 ] loss = 4.71297, acc = 0.77004


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 394/700 ] loss = 8.44655, acc = 0.65050


100%|██████████| 157/157 [00:32<00:00,  4.86it/s]


[ Train | 395/700 ] loss = 4.70554, acc = 0.76634


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 395/700 ] loss = 8.10702, acc = 0.65320


100%|██████████| 157/157 [00:30<00:00,  5.22it/s]


[ Train | 396/700 ] loss = 4.69343, acc = 0.76804


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 396/700 ] loss = 7.97804, acc = 0.68344


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 397/700 ] loss = 4.74191, acc = 0.76804


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 397/700 ] loss = 7.16660, acc = 0.70487


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 398/700 ] loss = 4.70248, acc = 0.76634


100%|██████████| 70/70 [00:13<00:00,  5.06it/s]


[ Valid | 398/700 ] loss = 7.75847, acc = 0.66088


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 399/700 ] loss = 4.69099, acc = 0.76784


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 399/700 ] loss = 7.29138, acc = 0.71142


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 400/700 ] loss = 4.69522, acc = 0.76313


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 400/700 ] loss = 7.74131, acc = 0.68569


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 401/700 ] loss = 4.70103, acc = 0.77054


100%|██████████| 70/70 [00:14<00:00,  4.67it/s]


[ Valid | 401/700 ] loss = 7.74685, acc = 0.66065


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 402/700 ] loss = 4.63946, acc = 0.77134


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 402/700 ] loss = 7.46463, acc = 0.67938


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 403/700 ] loss = 4.65715, acc = 0.76804


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 403/700 ] loss = 7.83045, acc = 0.66629


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 404/700 ] loss = 4.68929, acc = 0.77684


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 404/700 ] loss = 7.38561, acc = 0.69698


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 405/700 ] loss = 4.69647, acc = 0.76914


100%|██████████| 70/70 [00:12<00:00,  5.65it/s]


[ Valid | 405/700 ] loss = 7.84973, acc = 0.68389


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 406/700 ] loss = 4.69122, acc = 0.76854


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 406/700 ] loss = 8.59755, acc = 0.67125


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 407/700 ] loss = 4.66810, acc = 0.76814


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 407/700 ] loss = 7.91257, acc = 0.70916


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 408/700 ] loss = 4.65173, acc = 0.76654


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 408/700 ] loss = 7.46181, acc = 0.68818


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 409/700 ] loss = 4.66424, acc = 0.77354


100%|██████████| 70/70 [00:14<00:00,  4.89it/s]


[ Valid | 409/700 ] loss = 7.83962, acc = 0.67915


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 410/700 ] loss = 4.66480, acc = 0.77384


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 410/700 ] loss = 7.26252, acc = 0.71006


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 411/700 ] loss = 4.65261, acc = 0.76754


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 411/700 ] loss = 7.42503, acc = 0.68321


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 412/700 ] loss = 4.67506, acc = 0.77304


100%|██████████| 70/70 [00:14<00:00,  4.86it/s]


[ Valid | 412/700 ] loss = 7.45845, acc = 0.72247 -> best
Best model found at epoch 412, saving model


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 413/700 ] loss = 4.64429, acc = 0.77174


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 413/700 ] loss = 7.66811, acc = 0.68976


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 414/700 ] loss = 4.67517, acc = 0.76944


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 414/700 ] loss = 7.49567, acc = 0.70036


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 415/700 ] loss = 4.66555, acc = 0.76834


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 415/700 ] loss = 8.10061, acc = 0.66922


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 416/700 ] loss = 4.65159, acc = 0.77174


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 416/700 ] loss = 7.56041, acc = 0.72292 -> best
Best model found at epoch 416, saving model


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 417/700 ] loss = 4.66172, acc = 0.77114


100%|██████████| 70/70 [00:13<00:00,  5.12it/s]


[ Valid | 417/700 ] loss = 7.49495, acc = 0.71006


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 418/700 ] loss = 4.64158, acc = 0.77304


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 418/700 ] loss = 7.64079, acc = 0.71728


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 419/700 ] loss = 4.63327, acc = 0.77194


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 419/700 ] loss = 7.73372, acc = 0.69156


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 420/700 ] loss = 4.64327, acc = 0.77344


100%|██████████| 70/70 [00:14<00:00,  4.79it/s]


[ Valid | 420/700 ] loss = 7.60375, acc = 0.71683


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 421/700 ] loss = 4.65050, acc = 0.77354


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 421/700 ] loss = 6.95986, acc = 0.70826


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 422/700 ] loss = 4.62841, acc = 0.77254


100%|██████████| 70/70 [00:12<00:00,  5.60it/s]


[ Valid | 422/700 ] loss = 7.05740, acc = 0.72292


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 423/700 ] loss = 4.60735, acc = 0.76994


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 423/700 ] loss = 7.34663, acc = 0.69156


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 424/700 ] loss = 4.65669, acc = 0.77124


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 424/700 ] loss = 7.69669, acc = 0.69878


100%|██████████| 157/157 [00:31<00:00,  5.04it/s]


[ Train | 425/700 ] loss = 4.61231, acc = 0.76864


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 425/700 ] loss = 7.86913, acc = 0.66787


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 426/700 ] loss = 4.63627, acc = 0.77394


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 426/700 ] loss = 7.50860, acc = 0.71322


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 427/700 ] loss = 4.62978, acc = 0.77404


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 427/700 ] loss = 7.07530, acc = 0.70239


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 428/700 ] loss = 4.62284, acc = 0.77004


100%|██████████| 70/70 [00:14<00:00,  4.81it/s]


[ Valid | 428/700 ] loss = 7.87091, acc = 0.66742


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 429/700 ] loss = 4.59664, acc = 0.77384


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 429/700 ] loss = 8.03056, acc = 0.66471


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 430/700 ] loss = 4.62338, acc = 0.77204


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 430/700 ] loss = 7.74374, acc = 0.70329


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 431/700 ] loss = 4.59330, acc = 0.77534


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 431/700 ] loss = 7.71628, acc = 0.69201


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 432/700 ] loss = 4.58343, acc = 0.77634


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 432/700 ] loss = 7.01179, acc = 0.72789 -> best
Best model found at epoch 432, saving model


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 433/700 ] loss = 4.60549, acc = 0.77034


100%|██████████| 70/70 [00:13<00:00,  5.06it/s]


[ Valid | 433/700 ] loss = 7.94615, acc = 0.70081


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 434/700 ] loss = 4.64037, acc = 0.77014


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 434/700 ] loss = 7.89999, acc = 0.69134


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 435/700 ] loss = 4.57870, acc = 0.77384


100%|██████████| 70/70 [00:14<00:00,  4.73it/s]


[ Valid | 435/700 ] loss = 8.69057, acc = 0.61575


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 436/700 ] loss = 4.57584, acc = 0.77534


100%|██████████| 70/70 [00:16<00:00,  4.29it/s]


[ Valid | 436/700 ] loss = 7.16954, acc = 0.72338


100%|██████████| 157/157 [00:31<00:00,  4.95it/s]


[ Train | 437/700 ] loss = 4.56864, acc = 0.77644


100%|██████████| 70/70 [00:12<00:00,  5.52it/s]


[ Valid | 437/700 ] loss = 7.25447, acc = 0.70758


100%|██████████| 157/157 [00:34<00:00,  4.59it/s]


[ Train | 438/700 ] loss = 4.60436, acc = 0.77154


100%|██████████| 70/70 [00:14<00:00,  4.74it/s]


[ Valid | 438/700 ] loss = 7.73662, acc = 0.67351


100%|██████████| 157/157 [00:30<00:00,  5.20it/s]


[ Train | 439/700 ] loss = 4.57465, acc = 0.77564


100%|██████████| 70/70 [00:14<00:00,  4.77it/s]


[ Valid | 439/700 ] loss = 7.45283, acc = 0.68795


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 440/700 ] loss = 4.57562, acc = 0.78045


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 440/700 ] loss = 7.63133, acc = 0.68186


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 441/700 ] loss = 4.61638, acc = 0.77684


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 441/700 ] loss = 6.65098, acc = 0.72022


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 442/700 ] loss = 4.57223, acc = 0.77494


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 442/700 ] loss = 7.50060, acc = 0.71525


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 443/700 ] loss = 4.58791, acc = 0.77814


100%|██████████| 70/70 [00:13<00:00,  5.26it/s]


[ Valid | 443/700 ] loss = 7.36764, acc = 0.69179


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 444/700 ] loss = 4.58882, acc = 0.77254


100%|██████████| 70/70 [00:14<00:00,  4.94it/s]


[ Valid | 444/700 ] loss = 7.53614, acc = 0.69066


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 445/700 ] loss = 4.56025, acc = 0.77865


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 445/700 ] loss = 7.34166, acc = 0.70307


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 446/700 ] loss = 4.56265, acc = 0.78035


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 446/700 ] loss = 7.59794, acc = 0.68931


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 447/700 ] loss = 4.56036, acc = 0.77905


100%|██████████| 70/70 [00:14<00:00,  4.93it/s]


[ Valid | 447/700 ] loss = 7.47864, acc = 0.68073


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 448/700 ] loss = 4.53852, acc = 0.77784


100%|██████████| 70/70 [00:12<00:00,  5.62it/s]


[ Valid | 448/700 ] loss = 7.07903, acc = 0.70961


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 449/700 ] loss = 4.56858, acc = 0.77885


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 449/700 ] loss = 7.59532, acc = 0.69968


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 450/700 ] loss = 4.55641, acc = 0.77614


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 450/700 ] loss = 7.77805, acc = 0.68524


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 451/700 ] loss = 4.53636, acc = 0.78005


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 451/700 ] loss = 8.40205, acc = 0.70329


100%|██████████| 157/157 [00:31<00:00,  5.06it/s]


[ Train | 452/700 ] loss = 4.56064, acc = 0.77814


100%|██████████| 70/70 [00:14<00:00,  4.94it/s]


[ Valid | 452/700 ] loss = 7.44353, acc = 0.71683


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 453/700 ] loss = 4.55559, acc = 0.77945


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 453/700 ] loss = 8.00184, acc = 0.70104


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 454/700 ] loss = 4.52405, acc = 0.77764


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 454/700 ] loss = 7.17861, acc = 0.71209


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 455/700 ] loss = 4.53902, acc = 0.78075


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 455/700 ] loss = 8.31235, acc = 0.65411


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 456/700 ] loss = 4.53101, acc = 0.77694


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 456/700 ] loss = 7.59107, acc = 0.70713


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 457/700 ] loss = 4.55261, acc = 0.78255


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 457/700 ] loss = 7.75701, acc = 0.66697


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 458/700 ] loss = 4.49947, acc = 0.78245


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 458/700 ] loss = 8.33936, acc = 0.63335


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 459/700 ] loss = 4.54290, acc = 0.77574


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 459/700 ] loss = 7.18395, acc = 0.71345


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 460/700 ] loss = 4.50633, acc = 0.78075


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 460/700 ] loss = 7.49312, acc = 0.69021


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 461/700 ] loss = 4.55396, acc = 0.77875


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 461/700 ] loss = 7.29135, acc = 0.69449


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 462/700 ] loss = 4.49885, acc = 0.78275


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 462/700 ] loss = 7.43194, acc = 0.71480


100%|██████████| 157/157 [00:30<00:00,  5.22it/s]


[ Train | 463/700 ] loss = 4.52582, acc = 0.78155


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 463/700 ] loss = 8.60996, acc = 0.67351


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 464/700 ] loss = 4.50491, acc = 0.78045


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 464/700 ] loss = 8.49879, acc = 0.61868


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 465/700 ] loss = 4.53256, acc = 0.78435


100%|██████████| 70/70 [00:13<00:00,  5.20it/s]


[ Valid | 465/700 ] loss = 7.14310, acc = 0.72879 -> best
Best model found at epoch 465, saving model


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 466/700 ] loss = 4.49269, acc = 0.78085


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 466/700 ] loss = 7.82022, acc = 0.68434


100%|██████████| 157/157 [00:32<00:00,  4.80it/s]


[ Train | 467/700 ] loss = 4.49631, acc = 0.78575


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 467/700 ] loss = 7.48109, acc = 0.72834


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 468/700 ] loss = 4.49393, acc = 0.78515


100%|██████████| 70/70 [00:14<00:00,  4.80it/s]


[ Valid | 468/700 ] loss = 7.49246, acc = 0.69743


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 469/700 ] loss = 4.51057, acc = 0.78185


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 469/700 ] loss = 7.61122, acc = 0.69495


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 470/700 ] loss = 4.48523, acc = 0.78165


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 470/700 ] loss = 8.21748, acc = 0.62162


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 471/700 ] loss = 4.50662, acc = 0.78345


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 471/700 ] loss = 7.52580, acc = 0.70352


100%|██████████| 157/157 [00:32<00:00,  4.78it/s]


[ Train | 472/700 ] loss = 4.51138, acc = 0.78255


100%|██████████| 70/70 [00:12<00:00,  5.63it/s]


[ Valid | 472/700 ] loss = 9.13227, acc = 0.62681


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 473/700 ] loss = 4.45573, acc = 0.78515


100%|██████████| 70/70 [00:14<00:00,  4.76it/s]


[ Valid | 473/700 ] loss = 7.22028, acc = 0.71886


100%|██████████| 157/157 [00:30<00:00,  5.20it/s]


[ Train | 474/700 ] loss = 4.50923, acc = 0.77895


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 474/700 ] loss = 8.15365, acc = 0.67622


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 475/700 ] loss = 4.49663, acc = 0.78295


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 475/700 ] loss = 7.50526, acc = 0.67171


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 476/700 ] loss = 4.49815, acc = 0.78095


100%|██████████| 70/70 [00:14<00:00,  4.94it/s]


[ Valid | 476/700 ] loss = 7.53051, acc = 0.70623


100%|██████████| 157/157 [00:33<00:00,  4.67it/s]


[ Train | 477/700 ] loss = 4.46751, acc = 0.78525


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 477/700 ] loss = 8.47726, acc = 0.67374


100%|██████████| 157/157 [00:33<00:00,  4.71it/s]


[ Train | 478/700 ] loss = 4.45411, acc = 0.78415


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 478/700 ] loss = 7.49549, acc = 0.71841


100%|██████████| 157/157 [00:31<00:00,  5.06it/s]


[ Train | 479/700 ] loss = 4.45277, acc = 0.78535


100%|██████████| 70/70 [00:13<00:00,  5.12it/s]


[ Valid | 479/700 ] loss = 7.97956, acc = 0.67554


100%|██████████| 157/157 [00:33<00:00,  4.66it/s]


[ Train | 480/700 ] loss = 4.49200, acc = 0.78035


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 480/700 ] loss = 7.30109, acc = 0.70623


100%|██████████| 157/157 [00:31<00:00,  5.02it/s]


[ Train | 481/700 ] loss = 4.50060, acc = 0.78335


100%|██████████| 70/70 [00:14<00:00,  4.82it/s]


[ Valid | 481/700 ] loss = 7.55637, acc = 0.70871


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 482/700 ] loss = 4.48546, acc = 0.78485


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 482/700 ] loss = 7.57386, acc = 0.68931


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 483/700 ] loss = 4.43951, acc = 0.78645


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 483/700 ] loss = 7.74725, acc = 0.68028


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 484/700 ] loss = 4.46985, acc = 0.78625


100%|██████████| 70/70 [00:14<00:00,  4.90it/s]


[ Valid | 484/700 ] loss = 7.43172, acc = 0.71006


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 485/700 ] loss = 4.47219, acc = 0.78475


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 485/700 ] loss = 8.64156, acc = 0.65884


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 486/700 ] loss = 4.46273, acc = 0.78145


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 486/700 ] loss = 7.52482, acc = 0.70329


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 487/700 ] loss = 4.44954, acc = 0.78365


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 487/700 ] loss = 7.88014, acc = 0.69088


100%|██████████| 157/157 [00:34<00:00,  4.58it/s]


[ Train | 488/700 ] loss = 4.44094, acc = 0.78645


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 488/700 ] loss = 7.26957, acc = 0.70894


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 489/700 ] loss = 4.46586, acc = 0.78205


100%|██████████| 70/70 [00:14<00:00,  4.72it/s]


[ Valid | 489/700 ] loss = 7.74899, acc = 0.68051


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 490/700 ] loss = 4.43216, acc = 0.78495


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 490/700 ] loss = 7.54378, acc = 0.68073


100%|██████████| 157/157 [00:33<00:00,  4.70it/s]


[ Train | 491/700 ] loss = 4.43197, acc = 0.78685


100%|██████████| 70/70 [00:13<00:00,  5.19it/s]


[ Valid | 491/700 ] loss = 7.01653, acc = 0.71977


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 492/700 ] loss = 4.43663, acc = 0.78495


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 492/700 ] loss = 7.65240, acc = 0.70262


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 493/700 ] loss = 4.45140, acc = 0.78645


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 493/700 ] loss = 8.39676, acc = 0.66674


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 494/700 ] loss = 4.45127, acc = 0.78955


100%|██████████| 70/70 [00:13<00:00,  5.17it/s]


[ Valid | 494/700 ] loss = 7.47363, acc = 0.71412


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 495/700 ] loss = 4.45102, acc = 0.78465


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 495/700 ] loss = 7.88506, acc = 0.64170


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 496/700 ] loss = 4.39868, acc = 0.78865


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 496/700 ] loss = 7.72109, acc = 0.66020


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 497/700 ] loss = 4.43000, acc = 0.78535


100%|██████████| 70/70 [00:14<00:00,  4.84it/s]


[ Valid | 497/700 ] loss = 7.98986, acc = 0.69472


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 498/700 ] loss = 4.42817, acc = 0.79115


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 498/700 ] loss = 7.96962, acc = 0.67622


100%|██████████| 157/157 [00:32<00:00,  4.76it/s]


[ Train | 499/700 ] loss = 4.43519, acc = 0.78595


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 499/700 ] loss = 7.84114, acc = 0.67193


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 500/700 ] loss = 4.40093, acc = 0.79125


100%|██████████| 70/70 [00:17<00:00,  4.11it/s]


[ Valid | 500/700 ] loss = 7.69547, acc = 0.67983


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 501/700 ] loss = 4.41173, acc = 0.78655


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 501/700 ] loss = 7.34233, acc = 0.71142


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 502/700 ] loss = 4.44400, acc = 0.78205


100%|██████████| 70/70 [00:14<00:00,  4.83it/s]


[ Valid | 502/700 ] loss = 7.60966, acc = 0.68908


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 503/700 ] loss = 4.43397, acc = 0.78575


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 503/700 ] loss = 7.41262, acc = 0.71458


100%|██████████| 157/157 [00:32<00:00,  4.76it/s]


[ Train | 504/700 ] loss = 4.39233, acc = 0.78605


100%|██████████| 70/70 [00:12<00:00,  5.60it/s]


[ Valid | 504/700 ] loss = 7.28181, acc = 0.72225


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 505/700 ] loss = 4.41230, acc = 0.78585


100%|██████████| 70/70 [00:15<00:00,  4.65it/s]


[ Valid | 505/700 ] loss = 7.75650, acc = 0.71909


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 506/700 ] loss = 4.40613, acc = 0.78925


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 506/700 ] loss = 8.11216, acc = 0.66516


100%|██████████| 157/157 [00:32<00:00,  4.76it/s]


[ Train | 507/700 ] loss = 4.40430, acc = 0.78685


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 507/700 ] loss = 7.06574, acc = 0.72879


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 508/700 ] loss = 4.39404, acc = 0.79075


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 508/700 ] loss = 7.45107, acc = 0.70375


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 509/700 ] loss = 4.41610, acc = 0.78735


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 509/700 ] loss = 7.35003, acc = 0.72586


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 510/700 ] loss = 4.42072, acc = 0.78575


100%|██████████| 70/70 [00:15<00:00,  4.47it/s]


[ Valid | 510/700 ] loss = 7.15989, acc = 0.73601 -> best
Best model found at epoch 510, saving model


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 511/700 ] loss = 4.37743, acc = 0.79225


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 511/700 ] loss = 7.76457, acc = 0.71097


100%|██████████| 157/157 [00:33<00:00,  4.72it/s]


[ Train | 512/700 ] loss = 4.38923, acc = 0.79296


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 512/700 ] loss = 7.45621, acc = 0.70871


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 513/700 ] loss = 4.37647, acc = 0.78885


100%|██████████| 70/70 [00:14<00:00,  4.79it/s]


[ Valid | 513/700 ] loss = 7.25666, acc = 0.72699


100%|██████████| 157/157 [00:32<00:00,  4.85it/s]


[ Train | 514/700 ] loss = 4.38033, acc = 0.79346


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 514/700 ] loss = 7.80657, acc = 0.68186


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 515/700 ] loss = 4.37715, acc = 0.78955


100%|██████████| 70/70 [00:13<00:00,  5.05it/s]


[ Valid | 515/700 ] loss = 8.27187, acc = 0.68186


100%|██████████| 157/157 [00:30<00:00,  5.20it/s]


[ Train | 516/700 ] loss = 4.38238, acc = 0.78725


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 516/700 ] loss = 7.47874, acc = 0.70487


100%|██████████| 157/157 [00:36<00:00,  4.29it/s]


[ Train | 517/700 ] loss = 4.36945, acc = 0.79085


100%|██████████| 70/70 [00:17<00:00,  3.99it/s]


[ Valid | 517/700 ] loss = 7.76064, acc = 0.69224


100%|██████████| 157/157 [00:38<00:00,  4.10it/s]


[ Train | 518/700 ] loss = 4.37401, acc = 0.79135


100%|██████████| 70/70 [00:13<00:00,  5.09it/s]


[ Valid | 518/700 ] loss = 7.41197, acc = 0.69585


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 519/700 ] loss = 4.35764, acc = 0.79105


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 519/700 ] loss = 7.92202, acc = 0.70217


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 520/700 ] loss = 4.38423, acc = 0.79225


100%|██████████| 70/70 [00:12<00:00,  5.52it/s]


[ Valid | 520/700 ] loss = 7.58396, acc = 0.71616


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 521/700 ] loss = 4.36735, acc = 0.79175


100%|██████████| 70/70 [00:15<00:00,  4.56it/s]


[ Valid | 521/700 ] loss = 7.33140, acc = 0.73195


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 522/700 ] loss = 4.38010, acc = 0.78985


100%|██████████| 70/70 [00:12<00:00,  5.61it/s]


[ Valid | 522/700 ] loss = 7.09119, acc = 0.69675


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 523/700 ] loss = 4.36094, acc = 0.79265


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 523/700 ] loss = 8.02972, acc = 0.66042


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 524/700 ] loss = 4.38352, acc = 0.79255


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 524/700 ] loss = 7.62921, acc = 0.70284


100%|██████████| 157/157 [00:35<00:00,  4.41it/s]


[ Train | 525/700 ] loss = 4.35089, acc = 0.79426


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 525/700 ] loss = 7.28728, acc = 0.71683


100%|██████████| 157/157 [00:35<00:00,  4.47it/s]


[ Train | 526/700 ] loss = 4.36286, acc = 0.79085


100%|██████████| 70/70 [00:14<00:00,  4.68it/s]


[ Valid | 526/700 ] loss = 7.39200, acc = 0.68727


100%|██████████| 157/157 [00:31<00:00,  4.91it/s]


[ Train | 527/700 ] loss = 4.35693, acc = 0.79015


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 527/700 ] loss = 7.42816, acc = 0.72292


100%|██████████| 157/157 [00:33<00:00,  4.72it/s]


[ Train | 528/700 ] loss = 4.37355, acc = 0.79245


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 528/700 ] loss = 7.90717, acc = 0.68479


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 529/700 ] loss = 4.32599, acc = 0.79255


100%|██████████| 70/70 [00:16<00:00,  4.18it/s]


[ Valid | 529/700 ] loss = 7.93311, acc = 0.66922


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 530/700 ] loss = 4.37835, acc = 0.78935


100%|██████████| 70/70 [00:14<00:00,  4.83it/s]


[ Valid | 530/700 ] loss = 7.19581, acc = 0.70645


100%|██████████| 157/157 [00:34<00:00,  4.52it/s]


[ Train | 531/700 ] loss = 4.34201, acc = 0.78785


100%|██████████| 70/70 [00:13<00:00,  5.07it/s]


[ Valid | 531/700 ] loss = 7.49203, acc = 0.70014


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 532/700 ] loss = 4.32625, acc = 0.79616


100%|██████████| 70/70 [00:15<00:00,  4.41it/s]


[ Valid | 532/700 ] loss = 7.95807, acc = 0.72044


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 533/700 ] loss = 4.33745, acc = 0.79155


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 533/700 ] loss = 7.51995, acc = 0.71074


100%|██████████| 157/157 [00:33<00:00,  4.67it/s]


[ Train | 534/700 ] loss = 4.35607, acc = 0.79165


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 534/700 ] loss = 7.18718, acc = 0.72518


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 535/700 ] loss = 4.34939, acc = 0.79346


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 535/700 ] loss = 8.18908, acc = 0.68276


100%|██████████| 157/157 [00:33<00:00,  4.70it/s]


[ Train | 536/700 ] loss = 4.36982, acc = 0.79175


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 536/700 ] loss = 7.77712, acc = 0.68727


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 537/700 ] loss = 4.34715, acc = 0.79275


100%|██████████| 70/70 [00:15<00:00,  4.47it/s]


[ Valid | 537/700 ] loss = 7.80965, acc = 0.69562


100%|██████████| 157/157 [00:30<00:00,  5.20it/s]


[ Train | 538/700 ] loss = 4.33223, acc = 0.79376


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 538/700 ] loss = 7.66200, acc = 0.71683


100%|██████████| 157/157 [00:33<00:00,  4.66it/s]


[ Train | 539/700 ] loss = 4.32191, acc = 0.79766


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 539/700 ] loss = 8.16923, acc = 0.68096


100%|██████████| 157/157 [00:30<00:00,  5.23it/s]


[ Train | 540/700 ] loss = 4.32664, acc = 0.79185


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 540/700 ] loss = 7.46807, acc = 0.71390


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 541/700 ] loss = 4.32245, acc = 0.79526


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 541/700 ] loss = 8.20520, acc = 0.69720


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 542/700 ] loss = 4.31367, acc = 0.79346


100%|██████████| 70/70 [00:15<00:00,  4.65it/s]


[ Valid | 542/700 ] loss = 7.57560, acc = 0.72270


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 543/700 ] loss = 4.29490, acc = 0.79796


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 543/700 ] loss = 7.17166, acc = 0.73105


100%|██████████| 157/157 [00:33<00:00,  4.69it/s]


[ Train | 544/700 ] loss = 4.30858, acc = 0.79526


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 544/700 ] loss = 7.81111, acc = 0.71796


100%|██████████| 157/157 [00:30<00:00,  5.21it/s]


[ Train | 545/700 ] loss = 4.29387, acc = 0.79596


100%|██████████| 70/70 [00:13<00:00,  5.08it/s]


[ Valid | 545/700 ] loss = 7.73722, acc = 0.70600


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 546/700 ] loss = 4.33717, acc = 0.79145


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 546/700 ] loss = 8.07393, acc = 0.68299


100%|██████████| 157/157 [00:31<00:00,  4.98it/s]


[ Train | 547/700 ] loss = 4.30113, acc = 0.79316


100%|██████████| 70/70 [00:14<00:00,  4.92it/s]


[ Valid | 547/700 ] loss = 7.54514, acc = 0.71593


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 548/700 ] loss = 4.29355, acc = 0.79756


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 548/700 ] loss = 7.17490, acc = 0.73849 -> best
Best model found at epoch 548, saving model


100%|██████████| 157/157 [00:33<00:00,  4.72it/s]


[ Train | 549/700 ] loss = 4.31918, acc = 0.79696


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 549/700 ] loss = 7.42164, acc = 0.72112


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 550/700 ] loss = 4.32714, acc = 0.79606


100%|██████████| 70/70 [00:14<00:00,  4.99it/s]


[ Valid | 550/700 ] loss = 7.63404, acc = 0.70442


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 551/700 ] loss = 4.30141, acc = 0.79716


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 551/700 ] loss = 7.82499, acc = 0.64170


100%|██████████| 157/157 [00:32<00:00,  4.87it/s]


[ Train | 552/700 ] loss = 4.31211, acc = 0.79636


100%|██████████| 70/70 [00:13<00:00,  5.06it/s]


[ Valid | 552/700 ] loss = 8.05258, acc = 0.66561


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 553/700 ] loss = 4.31320, acc = 0.79436


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 553/700 ] loss = 7.37646, acc = 0.69404


100%|██████████| 157/157 [00:33<00:00,  4.73it/s]


[ Train | 554/700 ] loss = 4.30414, acc = 0.79866


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 554/700 ] loss = 7.57133, acc = 0.69179


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 555/700 ] loss = 4.28496, acc = 0.79836


100%|██████████| 70/70 [00:14<00:00,  4.85it/s]


[ Valid | 555/700 ] loss = 8.66279, acc = 0.68163


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 556/700 ] loss = 4.27054, acc = 0.79676


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 556/700 ] loss = 7.47231, acc = 0.73579


100%|██████████| 157/157 [00:32<00:00,  4.83it/s]


[ Train | 557/700 ] loss = 4.28557, acc = 0.79476


100%|██████████| 70/70 [00:13<00:00,  5.06it/s]


[ Valid | 557/700 ] loss = 8.29273, acc = 0.66020


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 558/700 ] loss = 4.27066, acc = 0.79986


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 558/700 ] loss = 7.40154, acc = 0.72315


100%|██████████| 157/157 [00:33<00:00,  4.68it/s]


[ Train | 559/700 ] loss = 4.28668, acc = 0.80176


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 559/700 ] loss = 7.77430, acc = 0.70465


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 560/700 ] loss = 4.29434, acc = 0.79726


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 560/700 ] loss = 7.41886, acc = 0.71841


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 561/700 ] loss = 4.29434, acc = 0.79846


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 561/700 ] loss = 7.84917, acc = 0.69359


100%|██████████| 157/157 [00:33<00:00,  4.67it/s]


[ Train | 562/700 ] loss = 4.29641, acc = 0.79686


100%|██████████| 70/70 [00:14<00:00,  5.00it/s]


[ Valid | 562/700 ] loss = 8.17637, acc = 0.71209


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 563/700 ] loss = 4.27988, acc = 0.80146


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 563/700 ] loss = 7.78675, acc = 0.69720


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 564/700 ] loss = 4.28077, acc = 0.79556


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 564/700 ] loss = 7.74819, acc = 0.68908


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 565/700 ] loss = 4.26814, acc = 0.80036


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 565/700 ] loss = 7.62068, acc = 0.71480


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 566/700 ] loss = 4.28574, acc = 0.79786


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 566/700 ] loss = 7.35094, acc = 0.68773


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 567/700 ] loss = 4.26399, acc = 0.79195


100%|██████████| 70/70 [00:14<00:00,  4.76it/s]


[ Valid | 567/700 ] loss = 7.54595, acc = 0.70217


100%|██████████| 157/157 [00:34<00:00,  4.54it/s]


[ Train | 568/700 ] loss = 4.28316, acc = 0.80116


100%|██████████| 70/70 [00:13<00:00,  5.28it/s]


[ Valid | 568/700 ] loss = 9.20058, acc = 0.62816


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 569/700 ] loss = 4.23799, acc = 0.79986


100%|██████████| 70/70 [00:14<00:00,  4.68it/s]


[ Valid | 569/700 ] loss = 7.69845, acc = 0.69517


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 570/700 ] loss = 4.26160, acc = 0.80636


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 570/700 ] loss = 7.68588, acc = 0.71525


100%|██████████| 157/157 [00:34<00:00,  4.55it/s]


[ Train | 571/700 ] loss = 4.27199, acc = 0.79786


100%|██████████| 70/70 [00:13<00:00,  5.23it/s]


[ Valid | 571/700 ] loss = 7.43080, acc = 0.67329


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 572/700 ] loss = 4.26498, acc = 0.79856


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 572/700 ] loss = 7.59344, acc = 0.70803


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 573/700 ] loss = 4.25612, acc = 0.79726


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 573/700 ] loss = 7.57558, acc = 0.68795


100%|██████████| 157/157 [00:33<00:00,  4.68it/s]


[ Train | 574/700 ] loss = 4.24585, acc = 0.79586


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 574/700 ] loss = 7.81480, acc = 0.69427


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 575/700 ] loss = 4.23617, acc = 0.79816


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 575/700 ] loss = 7.33009, acc = 0.71142


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 576/700 ] loss = 4.25826, acc = 0.80156


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 576/700 ] loss = 7.62419, acc = 0.72180


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 577/700 ] loss = 4.26365, acc = 0.80046


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 577/700 ] loss = 7.34512, acc = 0.70014


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 578/700 ] loss = 4.25187, acc = 0.80286


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 578/700 ] loss = 7.77395, acc = 0.66968


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 579/700 ] loss = 4.28551, acc = 0.80166


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 579/700 ] loss = 7.21367, acc = 0.72089


100%|██████████| 157/157 [00:33<00:00,  4.68it/s]


[ Train | 580/700 ] loss = 4.24458, acc = 0.80116


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 580/700 ] loss = 8.07854, acc = 0.68840


100%|██████████| 157/157 [00:30<00:00,  5.21it/s]


[ Train | 581/700 ] loss = 4.24042, acc = 0.80006


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 581/700 ] loss = 8.79811, acc = 0.67599


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 582/700 ] loss = 4.25026, acc = 0.80306


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 582/700 ] loss = 7.49271, acc = 0.70420


100%|██████████| 157/157 [00:33<00:00,  4.66it/s]


[ Train | 583/700 ] loss = 4.23736, acc = 0.79886


100%|██████████| 70/70 [00:12<00:00,  5.59it/s]


[ Valid | 583/700 ] loss = 7.66719, acc = 0.72789


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 584/700 ] loss = 4.22697, acc = 0.80136


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 584/700 ] loss = 7.62538, acc = 0.68005


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 585/700 ] loss = 4.24453, acc = 0.80316


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 585/700 ] loss = 7.86658, acc = 0.69562


100%|██████████| 157/157 [00:33<00:00,  4.64it/s]


[ Train | 586/700 ] loss = 4.24736, acc = 0.80306


100%|██████████| 70/70 [00:12<00:00,  5.56it/s]


[ Valid | 586/700 ] loss = 7.42174, acc = 0.72383


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 587/700 ] loss = 4.22404, acc = 0.79756


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 587/700 ] loss = 8.03869, acc = 0.69066


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 588/700 ] loss = 4.22632, acc = 0.79886


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 588/700 ] loss = 7.51691, acc = 0.71074


100%|██████████| 157/157 [00:33<00:00,  4.67it/s]


[ Train | 589/700 ] loss = 4.21182, acc = 0.80276


100%|██████████| 70/70 [00:13<00:00,  5.38it/s]


[ Valid | 589/700 ] loss = 7.67520, acc = 0.72766


100%|██████████| 157/157 [00:29<00:00,  5.24it/s]


[ Train | 590/700 ] loss = 4.21592, acc = 0.80246


100%|██████████| 70/70 [00:13<00:00,  5.24it/s]


[ Valid | 590/700 ] loss = 7.77995, acc = 0.72360


100%|██████████| 157/157 [00:31<00:00,  5.00it/s]


[ Train | 591/700 ] loss = 4.22326, acc = 0.80206


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 591/700 ] loss = 7.51713, acc = 0.71638


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 592/700 ] loss = 4.20347, acc = 0.80246


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 592/700 ] loss = 8.13636, acc = 0.70036


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 593/700 ] loss = 4.19842, acc = 0.80486


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 593/700 ] loss = 7.71772, acc = 0.70059


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 594/700 ] loss = 4.21709, acc = 0.80506


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 594/700 ] loss = 9.02673, acc = 0.66088


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 595/700 ] loss = 4.18457, acc = 0.80446


100%|██████████| 70/70 [00:13<00:00,  5.19it/s]


[ Valid | 595/700 ] loss = 8.19586, acc = 0.67329


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 596/700 ] loss = 4.21478, acc = 0.80246


100%|██████████| 70/70 [00:12<00:00,  5.60it/s]


[ Valid | 596/700 ] loss = 7.85624, acc = 0.70397


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 597/700 ] loss = 4.20057, acc = 0.80106


100%|██████████| 70/70 [00:13<00:00,  5.35it/s]


[ Valid | 597/700 ] loss = 8.06774, acc = 0.67216


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 598/700 ] loss = 4.19813, acc = 0.80636


100%|██████████| 70/70 [00:13<00:00,  5.10it/s]


[ Valid | 598/700 ] loss = 7.39571, acc = 0.71954


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 599/700 ] loss = 4.21408, acc = 0.80326


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 599/700 ] loss = 7.50385, acc = 0.69562


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 600/700 ] loss = 4.19087, acc = 0.80616


100%|██████████| 70/70 [00:13<00:00,  5.33it/s]


[ Valid | 600/700 ] loss = 7.60251, acc = 0.72744


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 601/700 ] loss = 4.18862, acc = 0.80466


100%|██████████| 70/70 [00:14<00:00,  4.77it/s]


[ Valid | 601/700 ] loss = 7.65473, acc = 0.72157


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 602/700 ] loss = 4.20181, acc = 0.79976


100%|██████████| 70/70 [00:12<00:00,  5.60it/s]


[ Valid | 602/700 ] loss = 8.40981, acc = 0.65659


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 603/700 ] loss = 4.17779, acc = 0.80186


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 603/700 ] loss = 7.31654, acc = 0.73014


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 604/700 ] loss = 4.17765, acc = 0.80977


100%|██████████| 70/70 [00:15<00:00,  4.54it/s]


[ Valid | 604/700 ] loss = 7.59379, acc = 0.71480


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 605/700 ] loss = 4.19257, acc = 0.80316


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 605/700 ] loss = 7.29134, acc = 0.73285


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 606/700 ] loss = 4.20073, acc = 0.80276


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 606/700 ] loss = 7.46037, acc = 0.70668


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 607/700 ] loss = 4.19846, acc = 0.80626


100%|██████████| 70/70 [00:15<00:00,  4.61it/s]


[ Valid | 607/700 ] loss = 8.37931, acc = 0.67103


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 608/700 ] loss = 4.19277, acc = 0.80206


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 608/700 ] loss = 8.02630, acc = 0.72811


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 609/700 ] loss = 4.17585, acc = 0.80516


100%|██████████| 70/70 [00:12<00:00,  5.59it/s]


[ Valid | 609/700 ] loss = 7.48347, acc = 0.70803


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 610/700 ] loss = 4.20139, acc = 0.80376


100%|██████████| 70/70 [00:14<00:00,  4.75it/s]


[ Valid | 610/700 ] loss = 7.35540, acc = 0.70239


100%|██████████| 157/157 [00:31<00:00,  5.01it/s]


[ Train | 611/700 ] loss = 4.17756, acc = 0.80206


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 611/700 ] loss = 7.24464, acc = 0.73669


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 612/700 ] loss = 4.17523, acc = 0.80356


100%|██████████| 70/70 [00:12<00:00,  5.59it/s]


[ Valid | 612/700 ] loss = 7.48355, acc = 0.73082


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 613/700 ] loss = 4.17464, acc = 0.81067


100%|██████████| 70/70 [00:13<00:00,  5.07it/s]


[ Valid | 613/700 ] loss = 7.73258, acc = 0.69246


100%|██████████| 157/157 [00:32<00:00,  4.76it/s]


[ Train | 614/700 ] loss = 4.14578, acc = 0.80777


100%|██████████| 70/70 [00:12<00:00,  5.60it/s]


[ Valid | 614/700 ] loss = 7.57625, acc = 0.71886


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 615/700 ] loss = 4.18942, acc = 0.80296


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 615/700 ] loss = 7.63433, acc = 0.74323 -> best
Best model found at epoch 615, saving model


100%|██████████| 157/157 [00:32<00:00,  4.89it/s]


[ Train | 616/700 ] loss = 4.18400, acc = 0.80336


100%|██████████| 70/70 [00:13<00:00,  5.12it/s]


[ Valid | 616/700 ] loss = 7.46777, acc = 0.72180


100%|██████████| 157/157 [00:34<00:00,  4.53it/s]


[ Train | 617/700 ] loss = 4.15674, acc = 0.80646


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 617/700 ] loss = 7.13660, acc = 0.72766


100%|██████████| 157/157 [00:32<00:00,  4.84it/s]


[ Train | 618/700 ] loss = 4.18338, acc = 0.80356


100%|██████████| 70/70 [00:14<00:00,  4.76it/s]


[ Valid | 618/700 ] loss = 7.90578, acc = 0.71322


100%|██████████| 157/157 [00:32<00:00,  4.88it/s]


[ Train | 619/700 ] loss = 4.14903, acc = 0.80706


100%|██████████| 70/70 [00:14<00:00,  4.89it/s]


[ Valid | 619/700 ] loss = 7.85165, acc = 0.71999


100%|██████████| 157/157 [00:33<00:00,  4.75it/s]


[ Train | 620/700 ] loss = 4.17736, acc = 0.80747


100%|██████████| 70/70 [00:13<00:00,  5.31it/s]


[ Valid | 620/700 ] loss = 7.72965, acc = 0.69788


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 621/700 ] loss = 4.16595, acc = 0.80066


100%|██████████| 70/70 [00:13<00:00,  5.16it/s]


[ Valid | 621/700 ] loss = 7.58650, acc = 0.70758


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 622/700 ] loss = 4.13834, acc = 0.80947


100%|██████████| 70/70 [00:14<00:00,  5.00it/s]


[ Valid | 622/700 ] loss = 7.30874, acc = 0.71886


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 623/700 ] loss = 4.14178, acc = 0.80636


100%|██████████| 70/70 [00:14<00:00,  4.99it/s]


[ Valid | 623/700 ] loss = 7.40035, acc = 0.73601


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 624/700 ] loss = 4.13577, acc = 0.80787


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 624/700 ] loss = 8.10036, acc = 0.67554


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 625/700 ] loss = 4.15332, acc = 0.80807


100%|██████████| 70/70 [00:15<00:00,  4.62it/s]


[ Valid | 625/700 ] loss = 8.04696, acc = 0.70984


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 626/700 ] loss = 4.13076, acc = 0.80757


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 626/700 ] loss = 7.51036, acc = 0.72541


100%|██████████| 157/157 [00:31<00:00,  4.93it/s]


[ Train | 627/700 ] loss = 4.15716, acc = 0.80146


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 627/700 ] loss = 7.14952, acc = 0.72902


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 628/700 ] loss = 4.14890, acc = 0.80917


100%|██████████| 70/70 [00:14<00:00,  4.71it/s]


[ Valid | 628/700 ] loss = 7.64917, acc = 0.72879


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 629/700 ] loss = 4.13989, acc = 0.80807


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 629/700 ] loss = 8.11400, acc = 0.68321


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 630/700 ] loss = 4.13526, acc = 0.80837


100%|██████████| 70/70 [00:12<00:00,  5.42it/s]


[ Valid | 630/700 ] loss = 8.04803, acc = 0.67870


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 631/700 ] loss = 4.14078, acc = 0.80506


100%|██████████| 70/70 [00:14<00:00,  4.80it/s]


[ Valid | 631/700 ] loss = 7.53065, acc = 0.69765


100%|██████████| 157/157 [00:31<00:00,  4.94it/s]


[ Train | 632/700 ] loss = 4.13365, acc = 0.80566


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 632/700 ] loss = 7.81346, acc = 0.69833


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 633/700 ] loss = 4.13241, acc = 0.80977


100%|██████████| 70/70 [00:12<00:00,  5.57it/s]


[ Valid | 633/700 ] loss = 7.35815, acc = 0.71751


100%|██████████| 157/157 [00:31<00:00,  5.05it/s]


[ Train | 634/700 ] loss = 4.10792, acc = 0.81237


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 634/700 ] loss = 7.43449, acc = 0.72699


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 635/700 ] loss = 4.13025, acc = 0.80957


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 635/700 ] loss = 7.47972, acc = 0.71841


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 636/700 ] loss = 4.10276, acc = 0.80646


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 636/700 ] loss = 7.22960, acc = 0.71841


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 637/700 ] loss = 4.13041, acc = 0.80977


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 637/700 ] loss = 8.14911, acc = 0.70758


100%|██████████| 157/157 [00:33<00:00,  4.64it/s]


[ Train | 638/700 ] loss = 4.14218, acc = 0.80917


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 638/700 ] loss = 8.59523, acc = 0.66832


100%|██████████| 157/157 [00:32<00:00,  4.79it/s]


[ Train | 639/700 ] loss = 4.12183, acc = 0.81037


100%|██████████| 70/70 [00:12<00:00,  5.59it/s]


[ Valid | 639/700 ] loss = 7.71351, acc = 0.69878


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 640/700 ] loss = 4.12856, acc = 0.80666


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 640/700 ] loss = 7.99874, acc = 0.68931


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 641/700 ] loss = 4.13296, acc = 0.80316


100%|██████████| 70/70 [00:12<00:00,  5.54it/s]


[ Valid | 641/700 ] loss = 7.35935, acc = 0.71841


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 642/700 ] loss = 4.10549, acc = 0.81257


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 642/700 ] loss = 7.30448, acc = 0.73466


100%|██████████| 157/157 [00:30<00:00,  5.22it/s]


[ Train | 643/700 ] loss = 4.11207, acc = 0.80696


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 643/700 ] loss = 7.70901, acc = 0.72811


100%|██████████| 157/157 [00:33<00:00,  4.62it/s]


[ Train | 644/700 ] loss = 4.12936, acc = 0.81097


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 644/700 ] loss = 7.26847, acc = 0.71616


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 645/700 ] loss = 4.09046, acc = 0.80967


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 645/700 ] loss = 7.56856, acc = 0.70510


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 646/700 ] loss = 4.12708, acc = 0.80837


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 646/700 ] loss = 7.58426, acc = 0.72292


100%|██████████| 157/157 [00:34<00:00,  4.57it/s]


[ Train | 647/700 ] loss = 4.09722, acc = 0.80777


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 647/700 ] loss = 7.30240, acc = 0.71954


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 648/700 ] loss = 4.12289, acc = 0.80887


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 648/700 ] loss = 8.01270, acc = 0.70081


100%|██████████| 157/157 [00:30<00:00,  5.18it/s]


[ Train | 649/700 ] loss = 4.08247, acc = 0.81237


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 649/700 ] loss = 7.51773, acc = 0.71841


100%|██████████| 157/157 [00:33<00:00,  4.65it/s]


[ Train | 650/700 ] loss = 4.10415, acc = 0.81217


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 650/700 ] loss = 7.38873, acc = 0.69833


100%|██████████| 157/157 [00:30<00:00,  5.20it/s]


[ Train | 651/700 ] loss = 4.11992, acc = 0.81097


100%|██████████| 70/70 [00:12<00:00,  5.52it/s]


[ Valid | 651/700 ] loss = 7.63112, acc = 0.73398


100%|██████████| 157/157 [00:30<00:00,  5.19it/s]


[ Train | 652/700 ] loss = 4.10981, acc = 0.81027


100%|██████████| 70/70 [00:12<00:00,  5.52it/s]


[ Valid | 652/700 ] loss = 7.61261, acc = 0.73195


100%|██████████| 157/157 [00:33<00:00,  4.63it/s]


[ Train | 653/700 ] loss = 4.11437, acc = 0.81137


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 653/700 ] loss = 7.44254, acc = 0.72518


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 654/700 ] loss = 4.08575, acc = 0.80947


100%|██████████| 70/70 [00:12<00:00,  5.49it/s]


[ Valid | 654/700 ] loss = 7.65379, acc = 0.68592


100%|██████████| 157/157 [00:31<00:00,  5.04it/s]


[ Train | 655/700 ] loss = 4.09577, acc = 0.81067


100%|██████████| 70/70 [00:13<00:00,  5.32it/s]


[ Valid | 655/700 ] loss = 7.84165, acc = 0.72450


100%|██████████| 157/157 [00:33<00:00,  4.70it/s]


[ Train | 656/700 ] loss = 4.10194, acc = 0.80947


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 656/700 ] loss = 8.31093, acc = 0.70736


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 657/700 ] loss = 4.11486, acc = 0.81107


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 657/700 ] loss = 7.39778, acc = 0.73872


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 658/700 ] loss = 4.07972, acc = 0.81407


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 658/700 ] loss = 7.89135, acc = 0.69698


100%|██████████| 157/157 [00:32<00:00,  4.77it/s]


[ Train | 659/700 ] loss = 4.10430, acc = 0.81107


100%|██████████| 70/70 [00:13<00:00,  5.14it/s]


[ Valid | 659/700 ] loss = 7.64434, acc = 0.71006


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 660/700 ] loss = 4.09915, acc = 0.81167


100%|██████████| 70/70 [00:12<00:00,  5.43it/s]


[ Valid | 660/700 ] loss = 7.24143, acc = 0.71616


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 661/700 ] loss = 4.07403, acc = 0.81457


100%|██████████| 70/70 [00:14<00:00,  4.97it/s]


[ Valid | 661/700 ] loss = 7.92237, acc = 0.71345


100%|██████████| 157/157 [00:34<00:00,  4.53it/s]


[ Train | 662/700 ] loss = 4.09833, acc = 0.81397


100%|██████████| 70/70 [00:14<00:00,  4.84it/s]


[ Valid | 662/700 ] loss = 7.28825, acc = 0.72766


100%|██████████| 157/157 [00:32<00:00,  4.90it/s]


[ Train | 663/700 ] loss = 4.09011, acc = 0.81277


100%|██████████| 70/70 [00:13<00:00,  5.13it/s]


[ Valid | 663/700 ] loss = 7.73131, acc = 0.70781


100%|██████████| 157/157 [00:31<00:00,  5.03it/s]


[ Train | 664/700 ] loss = 4.06853, acc = 0.81127


100%|██████████| 70/70 [00:12<00:00,  5.41it/s]


[ Valid | 664/700 ] loss = 7.63117, acc = 0.71277


100%|██████████| 157/157 [00:32<00:00,  4.82it/s]


[ Train | 665/700 ] loss = 4.08685, acc = 0.81137


100%|██████████| 70/70 [00:14<00:00,  4.85it/s]


[ Valid | 665/700 ] loss = 8.60521, acc = 0.68750


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 666/700 ] loss = 4.07023, acc = 0.81607


100%|██████████| 70/70 [00:12<00:00,  5.52it/s]


[ Valid | 666/700 ] loss = 8.40390, acc = 0.69201


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 667/700 ] loss = 4.08772, acc = 0.80727


100%|██████████| 70/70 [00:12<00:00,  5.46it/s]


[ Valid | 667/700 ] loss = 7.88222, acc = 0.71006


100%|██████████| 157/157 [00:31<00:00,  4.92it/s]


[ Train | 668/700 ] loss = 4.07428, acc = 0.81567


100%|██████████| 70/70 [00:15<00:00,  4.58it/s]


[ Valid | 668/700 ] loss = 7.82641, acc = 0.70284


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 669/700 ] loss = 4.06663, acc = 0.81457


100%|██████████| 70/70 [00:12<00:00,  5.58it/s]


[ Valid | 669/700 ] loss = 7.55906, acc = 0.67915


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 670/700 ] loss = 4.06684, acc = 0.81307


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 670/700 ] loss = 7.53929, acc = 0.71480


100%|██████████| 157/157 [00:30<00:00,  5.11it/s]


[ Train | 671/700 ] loss = 4.08451, acc = 0.80737


100%|██████████| 70/70 [00:15<00:00,  4.44it/s]


[ Valid | 671/700 ] loss = 7.30306, acc = 0.73060


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 672/700 ] loss = 4.08144, acc = 0.81307


100%|██████████| 70/70 [00:12<00:00,  5.46it/s]


[ Valid | 672/700 ] loss = 7.40487, acc = 0.72699


100%|██████████| 157/157 [00:31<00:00,  5.04it/s]


[ Train | 673/700 ] loss = 4.04805, acc = 0.81587


100%|██████████| 70/70 [00:14<00:00,  4.73it/s]


[ Valid | 673/700 ] loss = 7.80830, acc = 0.68344


100%|██████████| 157/157 [00:32<00:00,  4.81it/s]


[ Train | 674/700 ] loss = 4.05535, acc = 0.81337


100%|██████████| 70/70 [00:15<00:00,  4.52it/s]


[ Valid | 674/700 ] loss = 7.37928, acc = 0.73827


100%|██████████| 157/157 [00:30<00:00,  5.08it/s]


[ Train | 675/700 ] loss = 4.04543, acc = 0.81497


100%|██████████| 70/70 [00:13<00:00,  5.30it/s]


[ Valid | 675/700 ] loss = 7.75535, acc = 0.71435


100%|██████████| 157/157 [00:30<00:00,  5.14it/s]


[ Train | 676/700 ] loss = 4.07809, acc = 0.80907


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 676/700 ] loss = 7.56357, acc = 0.72067


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 677/700 ] loss = 4.06519, acc = 0.81127


100%|██████████| 70/70 [00:15<00:00,  4.42it/s]


[ Valid | 677/700 ] loss = 7.92335, acc = 0.72631


100%|██████████| 157/157 [00:30<00:00,  5.07it/s]


[ Train | 678/700 ] loss = 4.05490, acc = 0.81137


100%|██████████| 70/70 [00:13<00:00,  5.34it/s]


[ Valid | 678/700 ] loss = 7.71084, acc = 0.69991


100%|██████████| 157/157 [00:30<00:00,  5.10it/s]


[ Train | 679/700 ] loss = 4.02730, acc = 0.81327


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 679/700 ] loss = 7.57657, acc = 0.71255


100%|██████████| 157/157 [00:36<00:00,  4.28it/s]


[ Train | 680/700 ] loss = 4.04430, acc = 0.81477


100%|██████████| 70/70 [00:21<00:00,  3.20it/s]


[ Valid | 680/700 ] loss = 8.36248, acc = 0.67667


100%|██████████| 157/157 [00:44<00:00,  3.51it/s]


[ Train | 681/700 ] loss = 4.04373, acc = 0.81307


100%|██████████| 70/70 [00:19<00:00,  3.68it/s]


[ Valid | 681/700 ] loss = 8.23714, acc = 0.68660


100%|██████████| 157/157 [00:31<00:00,  4.99it/s]


[ Train | 682/700 ] loss = 4.04704, acc = 0.81407


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 682/700 ] loss = 7.61795, acc = 0.71119


100%|██████████| 157/157 [00:35<00:00,  4.46it/s]


[ Train | 683/700 ] loss = 4.05540, acc = 0.81607


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 683/700 ] loss = 7.28693, acc = 0.73669


100%|██████████| 157/157 [00:31<00:00,  4.97it/s]


[ Train | 684/700 ] loss = 4.03258, acc = 0.81507


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 684/700 ] loss = 8.75057, acc = 0.67374


100%|██████████| 157/157 [00:33<00:00,  4.74it/s]


[ Train | 685/700 ] loss = 4.03317, acc = 0.81147


100%|██████████| 70/70 [00:17<00:00,  4.00it/s]


[ Valid | 685/700 ] loss = 7.73587, acc = 0.71164


100%|██████████| 157/157 [00:34<00:00,  4.57it/s]


[ Train | 686/700 ] loss = 4.03637, acc = 0.80967


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 686/700 ] loss = 7.62268, acc = 0.72699


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 687/700 ] loss = 4.04203, acc = 0.81557


100%|██████████| 70/70 [00:12<00:00,  5.39it/s]


[ Valid | 687/700 ] loss = 7.35395, acc = 0.71796


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 688/700 ] loss = 4.02317, acc = 0.81337


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 688/700 ] loss = 7.94336, acc = 0.71255


100%|██████████| 157/157 [00:34<00:00,  4.58it/s]


[ Train | 689/700 ] loss = 4.05152, acc = 0.81547


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]


[ Valid | 689/700 ] loss = 7.96144, acc = 0.71548


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 690/700 ] loss = 4.03516, acc = 0.81247


100%|██████████| 70/70 [00:12<00:00,  5.40it/s]


[ Valid | 690/700 ] loss = 8.40932, acc = 0.68434


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 691/700 ] loss = 4.02438, acc = 0.81557


100%|██████████| 70/70 [00:12<00:00,  5.55it/s]


[ Valid | 691/700 ] loss = 7.99154, acc = 0.70826


100%|██████████| 157/157 [00:34<00:00,  4.57it/s]


[ Train | 692/700 ] loss = 4.02142, acc = 0.81627


100%|██████████| 70/70 [00:13<00:00,  5.37it/s]


[ Valid | 692/700 ] loss = 7.45189, acc = 0.72338


100%|██████████| 157/157 [00:30<00:00,  5.13it/s]


[ Train | 693/700 ] loss = 4.02902, acc = 0.81477


100%|██████████| 70/70 [00:12<00:00,  5.51it/s]


[ Valid | 693/700 ] loss = 8.01920, acc = 0.69765


100%|██████████| 157/157 [00:30<00:00,  5.12it/s]


[ Train | 694/700 ] loss = 4.00280, acc = 0.81447


100%|██████████| 70/70 [00:12<00:00,  5.44it/s]


[ Valid | 694/700 ] loss = 7.48763, acc = 0.71773


100%|██████████| 157/157 [00:34<00:00,  4.53it/s]


[ Train | 695/700 ] loss = 4.03668, acc = 0.81577


100%|██████████| 70/70 [00:12<00:00,  5.53it/s]


[ Valid | 695/700 ] loss = 7.11699, acc = 0.74458 -> best
Best model found at epoch 695, saving model


100%|██████████| 157/157 [00:30<00:00,  5.09it/s]


[ Train | 696/700 ] loss = 4.02651, acc = 0.81357


100%|██████████| 70/70 [00:12<00:00,  5.50it/s]


[ Valid | 696/700 ] loss = 7.87720, acc = 0.70984


100%|██████████| 157/157 [00:30<00:00,  5.16it/s]


[ Train | 697/700 ] loss = 3.99333, acc = 0.82107


100%|██████████| 70/70 [00:12<00:00,  5.45it/s]


[ Valid | 697/700 ] loss = 8.24737, acc = 0.69382


100%|██████████| 157/157 [00:34<00:00,  4.59it/s]


[ Train | 698/700 ] loss = 4.02049, acc = 0.81537


100%|██████████| 70/70 [00:12<00:00,  5.47it/s]


[ Valid | 698/700 ] loss = 7.38936, acc = 0.72721


100%|██████████| 157/157 [00:30<00:00,  5.15it/s]


[ Train | 699/700 ] loss = 4.01909, acc = 0.81717


100%|██████████| 70/70 [00:12<00:00,  5.48it/s]


[ Valid | 699/700 ] loss = 8.43644, acc = 0.70532


100%|██████████| 157/157 [00:30<00:00,  5.17it/s]


[ Train | 700/700 ] loss = 3.99728, acc = 0.81747


100%|██████████| 70/70 [00:13<00:00,  5.36it/s]

[ Valid | 700/700 ] loss = 7.85650, acc = 0.69269
Finish training





### Inference
load the best model of the experiment and generate submission.csv

In [17]:
# create dataloader for evaluation
eval_set = FoodDataset(os.path.join(cfg['dataset_root'], "evaluation"), tfm=test_tfm)
eval_loader = DataLoader(eval_set, batch_size=cfg['batch_size'], shuffle=False, num_workers=0, pin_memory=True)

One /kaggle/input/ml2023spring-hw13/Food-11/evaluation sample /kaggle/input/ml2023spring-hw13/Food-11/evaluation/0000.jpg


In [18]:
# Load model from {exp_name}/student_best.ckpt
student_model_best = get_student_model() # get a new student model to avoid reference before assignment.
ckpt_path = f"{save_path}/student_best.ckpt" # the ckpt path of the best student model.
student_model_best.load_state_dict(torch.load(ckpt_path, map_location='cpu')) # load the state dict and set it to the student model
student_model_best.to(device) # set the student model to device

# Start evaluate
student_model_best.eval()
eval_preds = [] # storing predictions of the evaluation dataset

# Iterate the validation set by batches.
for batch in tqdm(eval_loader):
    # A batch consists of image data and corresponding labels.
    imgs, _ = batch
    # We don't need gradient in evaluation.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = student_model_best(imgs.to(device))
        preds = list(logits.argmax(dim=-1).squeeze().cpu().numpy())
    # loss and acc can not be calculated because we do not have the true labels of the evaluation set.
    eval_preds += preds

def pad4(i):
    return "0"*(4-len(str(i))) + str(i)

# Save prediction results
ids = [pad4(i) for i in range(0,len(eval_set))]
categories = eval_preds

df = pd.DataFrame()
df['Id'] = ids
df['Category'] = categories
df.to_csv(f"{save_path}/submission.csv", index=False) # now you can download the submission.csv and upload it to the kaggle competition.

100%|██████████| 35/35 [00:27<00:00,  1.26it/s]


> Don't forget to answer the report questions on GradeScope! 