# The Springfield Identity - OPTIMIZED Training Pipeline
## High-Performance CNN Training with Detailed Progress Tracking

### Optimizations:
- **Full CPU Utilization**: Maximized num_workers and persistent workers
- **Efficient Data Loading**: Prefetch factor and pin memory
- **Detailed Progress Bars**: Multi-level tracking with real-time metrics
- **Early Stopping**: Prevent overfitting
- **Gradient Accumulation**: Effective larger batch size
- **Learning Rate Scheduling**: Adaptive learning rate

In [1]:
import os
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from sklearn.metrics import f1_score
from tqdm.auto import tqdm
import json
from collections import defaultdict
import time
from datetime import datetime
import multiprocessing
import platform

if platform.system() == 'Windows':
    NUM_WORKERS = 0
    print("   Windows detected: Using single-threaded data loading (num_workers=0)")
    print("   This avoids slow multiprocessing spawn overhead on Windows")
else:
    NUM_WORKERS = min(multiprocessing.cpu_count(), 8)
print(f"CPU Cores Detected: {multiprocessing.cpu_count()}")
print(f"Using {NUM_WORKERS} workers for data loading")

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    os.environ['PYTHONHASHSEED'] = str(seed)
    
SEED = 42
set_seed(SEED)

torch.set_num_threads(multiprocessing.cpu_count())

print("\n" + "="*80)
print("THE SPRINGFIELD IDENTITY - OPTIMIZED CNN TRAINING")
print("="*80)
print(f"Random Seed: {SEED}")
print(f"PyTorch Version: {torch.__version__}")
print(f"PyTorch Threads: {torch.get_num_threads()}")
print(f"Training Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)

‚ö†Ô∏è  Windows detected: Using single-threaded data loading (num_workers=0)
   This avoids slow multiprocessing spawn overhead on Windows
CPU Cores Detected: 32
Using 0 workers for data loading

THE SPRINGFIELD IDENTITY - OPTIMIZED CNN TRAINING
Random Seed: 42
PyTorch Version: 2.9.1+cpu
PyTorch Threads: 32
Training Started: 2025-12-10 23:35:03


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
CONFIG = {
    'data_dir': 'characters_train',
    'img_size': 128,               
    'batch_size': 64,  # Increased for better CPU utilization
    'epochs': 25,                   
    'lr': 0.001,
    'weight_decay': 1e-4,
    'early_stopping_patience': 5,
    'device': 'cuda' if torch.cuda.is_available() else 'cpu',
    'model_save_path': 'model.pth',
    'num_workers': NUM_WORKERS,
    'prefetch_factor': 4,
    'persistent_workers': True if NUM_WORKERS > 0 else False,
    'grad_clip': 1.0,
    'grad_accum_steps': 1
}

print("\n" + "="*80)
print("OPTIMIZED CONFIGURATION")
print("="*80)
for key, value in CONFIG.items():
    print(f"{key:.<35} {value}")
print("="*80)
print(f"\nEffective Batch Size: {CONFIG['batch_size'] * CONFIG['grad_accum_steps']}")
print(f"Total Data Loading Workers: {CONFIG['num_workers']}")
print(f"Prefetch Buffer per Worker: {CONFIG['prefetch_factor']} batches")


OPTIMIZED CONFIGURATION
data_dir........................... characters_train
img_size........................... 128
batch_size......................... 64
epochs............................. 25
lr................................. 0.001
weight_decay....................... 0.0001
early_stopping_patience............ 5
device............................. cpu
model_save_path.................... model.pth
num_workers........................ 0
prefetch_factor.................... 4
persistent_workers................. False
grad_clip.......................... 1.0
grad_accum_steps................... 1

Effective Batch Size: 64
Total Data Loading Workers: 0
Prefetch Buffer per Worker: 4 batches


In [3]:
print("\n" + "="*80)
print("DATA LOADING")
print("="*80)

train_transforms = transforms.Compose([
    transforms.Resize((CONFIG['img_size'], CONFIG['img_size']), interpolation=transforms.InterpolationMode.BILINEAR),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((CONFIG['img_size'], CONFIG['img_size']), interpolation=transforms.InterpolationMode.BILINEAR),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

print("\n Loading dataset...")
load_start = time.time()

try:
    full_dataset = datasets.ImageFolder(root=CONFIG['data_dir'])
    print(f"‚úì Dataset loaded in {time.time() - load_start:.2f}s")
except FileNotFoundError:
    print(f"‚úó Error: '{CONFIG['data_dir']}' not found!")
    raise

class_names = full_dataset.classes
num_classes = len(class_names)
total_images = len(full_dataset)

print(f"\n Dataset Statistics:")
print(f"  Total Images............ {total_images:,}")
print(f"  Number of Classes....... {num_classes}")
print(f"  Classes: {', '.join(class_names[:5])}{'...' if num_classes > 5 else ''}")

print("\n Analyzing class distribution...")
class_counts = defaultdict(int)
for _, label in tqdm(full_dataset, desc="Scanning", unit="img", leave=False):
    class_counts[label] += 1

print(f"  Min samples/class....... {min(class_counts.values()):,}")
print(f"  Max samples/class....... {max(class_counts.values()):,}")
print(f"  Mean samples/class...... {np.mean(list(class_counts.values())):.1f}")
print(f"  Median samples/class.... {np.median(list(class_counts.values())):.1f}")

train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset = random_split(
    full_dataset, 
    [train_size, val_size], 
    generator=torch.Generator().manual_seed(SEED)
)

print(f"\n Data Split:")
print(f"  Train............. {train_size:,} ({train_size/total_images*100:.1f}%)")
print(f"  Validation........ {val_size:,} ({val_size/total_images*100:.1f}%)")

class TransformSubset(torch.utils.data.Dataset):
    def __init__(self, subset, transform=None):
        self.subset = subset
        self.transform = transform
        
    def __getitem__(self, index):
        x, y = self.subset[index]
        if self.transform:
            x = self.transform(x)
        return x, y
        
    def __len__(self):
        return len(self.subset)

train_data = TransformSubset(train_dataset, train_transforms)
val_data = TransformSubset(val_dataset, val_transforms)

print(f"\n‚ö° Creating optimized DataLoaders...")

train_loader = DataLoader(
    train_data, 
    batch_size=CONFIG['batch_size'], 
    shuffle=True, 
    num_workers=CONFIG['num_workers'],
    pin_memory=True if CONFIG['device'] == 'cuda' else False,
    prefetch_factor=CONFIG['prefetch_factor'] if CONFIG['num_workers'] > 0 else None,
    persistent_workers=CONFIG['persistent_workers']
)

val_loader = DataLoader(
    val_data, 
    batch_size=CONFIG['batch_size'], 
    shuffle=False, 
    num_workers=CONFIG['num_workers'],
    pin_memory=True if CONFIG['device'] == 'cuda' else False,
    prefetch_factor=CONFIG['prefetch_factor'] if CONFIG['num_workers'] > 0 else None,
    persistent_workers=CONFIG['persistent_workers']
)

print(f"‚úì DataLoaders ready")
print(f"  Batch size.............. {CONFIG['batch_size']}")
print(f"  Train batches........... {len(train_loader)}")
print(f"  Val batches............. {len(val_loader)}")
print(f"  Workers................. {CONFIG['num_workers']}")
print(f"  Prefetch factor......... {CONFIG['prefetch_factor']}")
print(f"  Persistent workers...... {CONFIG['persistent_workers']}")
print("="*80)


DATA LOADING

üìÅ Loading dataset...
‚úì Dataset loaded in 0.02s

üìä Dataset Statistics:
  Total Images............ 16,764
  Number of Classes....... 42
  Classes: abraham_grampa_simpson, agnes_skinner, apu_nahasapeemapetilon, barney_gumble, bart_simpson...

üîç Analyzing class distribution...


                                                                  

  Min samples/class....... 3
  Max samples/class....... 1,797
  Mean samples/class...... 399.1
  Median samples/class.... 124.0

‚úÇÔ∏è Data Split:
  Train............. 13,411 (80.0%)
  Validation........ 3,353 (20.0%)

‚ö° Creating optimized DataLoaders...
‚úì DataLoaders ready
  Batch size.............. 64
  Train batches........... 210
  Val batches............. 53
  Workers................. 0
  Prefetch factor......... 4
  Persistent workers...... False




In [4]:
print("\n" + "="*80)
print("MODEL ARCHITECTURE")
print("="*80)

class SimpsonsCNN(nn.Module):
    """Optimized CNN for Simpsons Character Classification"""
    
    def __init__(self, num_classes):
        super(SimpsonsCNN, self).__init__()
        
        def conv_block(in_c, out_c, pool=True):
            layers = [
                nn.Conv2d(in_c, out_c, kernel_size=3, padding=1, bias=False),
                nn.BatchNorm2d(out_c),
                nn.ReLU(inplace=True),
                nn.Conv2d(out_c, out_c, kernel_size=3, padding=1, bias=False),
                nn.BatchNorm2d(out_c),
                nn.ReLU(inplace=True)
            ]
            if pool:
                layers.append(nn.MaxPool2d(2))
            return nn.Sequential(*layers)
        
        self.block1 = conv_block(3, 32)
        self.block2 = conv_block(32, 64)
        self.block3 = conv_block(64, 128)
        self.block4 = conv_block(128, 256)
        
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.global_pool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
        return x

print("\nüèóÔ∏è Initializing model...")
model = SimpsonsCNN(num_classes).to(CONFIG['device'])

total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"‚úì Model initialized")
print(f"  Architecture............ SimpsonsCNN")
print(f"  Total parameters........ {total_params:,}")
print(f"  Trainable parameters.... {trainable_params:,}")
print(f"  Model size.............. {total_params * 4 / (1024**2):.2f} MB")
print(f"  Device.................. {CONFIG['device']}")
print("="*80)


MODEL ARCHITECTURE

üèóÔ∏è Initializing model...
‚úì Model initialized
  Architecture............ SimpsonsCNN
  Total parameters........ 1,184,010
  Trainable parameters.... 1,184,010
  Model size.............. 4.52 MB
  Device.................. cpu


In [5]:
print("\n" + "="*80)
print("TRAINING SETUP")
print("="*80)

criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(
    model.parameters(), 
    lr=CONFIG['lr'], 
    weight_decay=CONFIG['weight_decay']
)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, 
    mode='max', 
    factor=0.5, 
    patience=2,
)

print(f"\n‚öôÔ∏è Optimizer Configuration:")
print(f"  Loss function........... CrossEntropyLoss")
print(f"  Optimizer............... AdamW")
print(f"  Learning rate........... {CONFIG['lr']}")
print(f"  Weight decay............ {CONFIG['weight_decay']}")
print(f"  LR scheduler............ ReduceLROnPlateau")
print(f"  Gradient clipping....... {CONFIG['grad_clip']}")
print(f"  Gradient accum steps.... {CONFIG['grad_accum_steps']}")
print(f"  Early stopping.......... {CONFIG['early_stopping_patience']} epochs")
print("="*80)


TRAINING SETUP

‚öôÔ∏è Optimizer Configuration:
  Loss function........... CrossEntropyLoss
  Optimizer............... AdamW
  Learning rate........... 0.001
  Weight decay............ 0.0001
  LR scheduler............ ReduceLROnPlateau
  Gradient clipping....... 1.0
  Gradient accum steps.... 1
  Early stopping.......... 5 epochs


In [6]:
def train_one_epoch(model, loader, optimizer, criterion, epoch, total_epochs, grad_accum_steps=1):
    """Train for one epoch with detailed progress tracking"""
    model.train()
    running_loss = 0.0
    all_preds = []
    all_labels = []
    
    pbar = tqdm(
        enumerate(loader), 
        total=len(loader),
        desc=f"Epoch {epoch:2d}/{total_epochs} [TRAIN]",
        unit="batch",
        leave=False,
        bar_format='{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}] {postfix}',
        colour='blue'
    )
    
    for batch_idx, (images, labels) in pbar:
        images, labels = images.to(CONFIG['device']), labels.to(CONFIG['device'])
        
        outputs = model(images)
        loss = criterion(outputs, labels) / grad_accum_steps
        
        loss.backward()
        
        if (batch_idx + 1) % grad_accum_steps == 0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), CONFIG['grad_clip'])
            optimizer.step()
            optimizer.zero_grad()
        
        running_loss += loss.item() * grad_accum_steps
        _, preds = torch.max(outputs, 1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())
        
        if (batch_idx + 1) % 5 == 0:
            current_loss = running_loss / (batch_idx + 1)
            current_f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
            pbar.set_postfix({
                'loss': f'{current_loss:.4f}',
                'f1': f'{current_f1:.3f}',
                'lr': f'{optimizer.param_groups[0]["lr"]:.6f}'
            })
    
    pbar.close()
    
    epoch_loss = running_loss / len(loader)
    epoch_f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
    
    return epoch_loss, epoch_f1

def validate(model, loader, criterion, epoch, total_epochs):
    model.eval()
    running_loss = 0.0
    all_preds = []
    all_labels = []
    
    pbar = tqdm(
        enumerate(loader),
        total=len(loader),
        desc=f"Epoch {epoch:2d}/{total_epochs} [VALID]",
        unit="batch",
        leave=False,
        bar_format='{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}] {postfix}',
        colour='green'
    )
    
    with torch.no_grad():
        for batch_idx, (images, labels) in pbar:
            images, labels = images.to(CONFIG['device']), labels.to(CONFIG['device'])
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            running_loss += loss.item()
            _, preds = torch.max(outputs, 1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            
            if (batch_idx + 1) % 5 == 0:
                current_loss = running_loss / (batch_idx + 1)
                current_f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
                pbar.set_postfix({
                    'loss': f'{current_loss:.4f}',
                    'f1': f'{current_f1:.3f}'
                })
    
    pbar.close()
    
    epoch_loss = running_loss / len(loader)
    epoch_f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
    
    return epoch_loss, epoch_f1

print("\n‚úì Training functions initialized with detailed progress tracking")


‚úì Training functions initialized with detailed progress tracking


In [7]:
print("\n" + "="*80)
print("üöÄ STARTING OPTIMIZED TRAINING")
print("="*80)

best_f1 = 0.0
best_epoch = 0
epochs_without_improvement = 0
training_history = {
    'train_loss': [],
    'train_f1': [],
    'val_loss': [],
    'val_f1': [],
    'learning_rates': [],
    'epoch_times': []
}

start_time = time.time()

epoch_pbar = tqdm(
    range(1, CONFIG['epochs'] + 1),
    desc="üìä Training Progress",
    unit="epoch",
    bar_format='{desc}: {percentage:3.0f}%|{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}] {postfix}',
    colour='cyan',
    position=0
)

print("\n" + "-"*80)
print(f"{'Epoch':^6} | {'Time':^7} | {'LR':^10} | {'Train Loss':^11} | {'Train F1':^9} | {'Val Loss':^11} | {'Val F1':^9} | {'Status':^15}")
print("-"*80)

for epoch in epoch_pbar:
    epoch_start_time = time.time()
    
    current_lr = optimizer.param_groups[0]['lr']
    
    train_loss, train_f1 = train_one_epoch(
        model, train_loader, optimizer, criterion, 
        epoch, CONFIG['epochs'], CONFIG['grad_accum_steps']
    )
    
    val_loss, val_f1 = validate(
        model, val_loader, criterion, epoch, CONFIG['epochs']
    )
    
    scheduler.step(val_f1)
    
    epoch_time = time.time() - epoch_start_time
    training_history['train_loss'].append(train_loss)
    training_history['train_f1'].append(train_f1)
    training_history['val_loss'].append(val_loss)
    training_history['val_f1'].append(val_f1)
    training_history['learning_rates'].append(current_lr)
    training_history['epoch_times'].append(epoch_time)
    
    status = ""
    if val_f1 > best_f1:
        improvement = val_f1 - best_f1
        best_f1 = val_f1
        best_epoch = epoch
        epochs_without_improvement = 0
        status = f"‚≠ê BEST (+{improvement:.4f})"
        
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'best_f1': best_f1,
            'class_names': class_names,
            'config': CONFIG,
            'training_history': training_history
        }
        torch.save(checkpoint, CONFIG['model_save_path'])
    else:
        epochs_without_improvement += 1
        status = f"({epochs_without_improvement}/{CONFIG['early_stopping_patience']})"
    
    print(f"{epoch:^6} | {epoch_time:>6.1f}s | {current_lr:>10.6f} | "
          f"{train_loss:>11.4f} | {train_f1:>9.4f} | "
          f"{val_loss:>11.4f} | {val_f1:>9.4f} | {status:^15}")
    
    epoch_pbar.set_postfix({
        'Best_F1': f'{best_f1:.4f}',
        'Val_F1': f'{val_f1:.4f}',
        'ETA': f'{np.mean(training_history["epoch_times"]) * (CONFIG["epochs"] - epoch) / 60:.1f}m'
    })
    
    if epochs_without_improvement >= CONFIG['early_stopping_patience']:
        print("-"*80)
        print(f"\nüõë EARLY STOPPING: No improvement for {CONFIG['early_stopping_patience']} epochs")
        print(f"   Best F1: {best_f1:.4f} at epoch {best_epoch}")
        break

epoch_pbar.close()
print("-"*80)

total_time = time.time() - start_time
print("\n" + "="*80)
print("‚úÖ TRAINING COMPLETE")
print("="*80)
print(f"\n‚è±Ô∏è Time Statistics:")
print(f"  Total time.............. {total_time/60:.2f} minutes ({total_time:.0f}s)")
print(f"  Average per epoch....... {np.mean(training_history['epoch_times']):.2f}s")
print(f"  Fastest epoch........... {min(training_history['epoch_times']):.2f}s")
print(f"  Slowest epoch........... {max(training_history['epoch_times']):.2f}s")

print(f"\nüìà Performance:")
print(f"  Best Validation F1...... {best_f1:.4f} (Epoch {best_epoch})")
print(f"  Final Train F1.......... {training_history['train_f1'][-1]:.4f}")
print(f"  Final Val F1............ {training_history['val_f1'][-1]:.4f}")
print(f"  Initial Val F1.......... {training_history['val_f1'][0]:.4f}")
print(f"  Total Improvement....... {training_history['val_f1'][-1] - training_history['val_f1'][0]:+.4f}")

print(f"\nüíæ Output Files:")
print(f"  Model checkpoint........ {CONFIG['model_save_path']}")

history_file = 'training_history.json'
with open(history_file, 'w') as f:
    json.dump(training_history, f, indent=4)
print(f"  Training history........ {history_file}")

print("\n" + "="*80)
print("üéâ SUCCESS! Model is ready for inference.")
print("="*80)


üöÄ STARTING OPTIMIZED TRAINING


üìä Training Progress:   0%|[36m          [0m| 0/25 [00:00<?] 


--------------------------------------------------------------------------------
Epoch  |  Time   |     LR     | Train Loss  | Train F1  |  Val Loss   |  Val F1   |     Status     
--------------------------------------------------------------------------------



Epoch  1/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  1/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:01<03:36,  1.04s/batch] [A
Epoch  1/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:58,  1.16batch/s] [A
Epoch  1/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:41,  1.28batch/s] [A
Epoch  1/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:32,  1.36batch/s] [A
Epoch  1/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:32,  1.36batch/s] , loss=3.6192, f1=0.026, lr=0.001000[A
Epoch  1/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:25,  1.41batch/s] , loss=3.6192, f1=0.026, lr=0.001000[A
Epoch  1/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:19,  1.47batch/s] , loss=3.6192, f1=0.026, lr=0.001000[A
Epoch  1/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:05<02:13,  1.52batch/s] , loss=3.6192, f1=0.026, lr=0.001000[A
Epoch  1/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:13,  1.51batch/s] , l

  1    |  163.8s |   0.001000 |      2.7872 |    0.0767 |      2.3561 |    0.1367 | ‚≠ê BEST (+0.1367)



Epoch  2/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  2/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:38,  1.32batch/s] [A
Epoch  2/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:33,  1.35batch/s] [A
Epoch  2/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:34,  1.34batch/s] [A
Epoch  2/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:31,  1.36batch/s] [A
Epoch  2/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:31,  1.36batch/s] , loss=2.3054, f1=0.187, lr=0.001000[A
Epoch  2/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:31,  1.36batch/s] , loss=2.3054, f1=0.187, lr=0.001000[A
Epoch  2/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:32,  1.34batch/s] , loss=2.3054, f1=0.187, lr=0.001000[A
Epoch  2/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:05<02:34,  1.32batch/s] , loss=2.3054, f1=0.187, lr=0.001000[A
Epoch  2/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:06<02:34,  1.31batch/s] , l

  2    |  163.4s |   0.001000 |      2.0954 |    0.1663 |      1.7541 |    0.2196 | ‚≠ê BEST (+0.0829)



Epoch  3/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  3/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:40,  1.30batch/s] [A
Epoch  3/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:25,  1.43batch/s] [A
Epoch  3/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:27,  1.41batch/s] [A
Epoch  3/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:24,  1.42batch/s] [A
Epoch  3/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:24,  1.42batch/s] , loss=1.8566, f1=0.234, lr=0.001000[A
Epoch  3/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:22,  1.44batch/s] , loss=1.8566, f1=0.234, lr=0.001000[A
Epoch  3/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:20,  1.45batch/s] , loss=1.8566, f1=0.234, lr=0.001000[A
Epoch  3/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:20,  1.45batch/s] , loss=1.8566, f1=0.234, lr=0.001000[A
Epoch  3/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:19,  1.44batch/s] , l

  3    |  162.3s |   0.001000 |      1.6651 |    0.2344 |      1.5482 |    0.2595 | ‚≠ê BEST (+0.0399)



Epoch  4/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  4/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:22,  1.47batch/s] [A
Epoch  4/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:28,  1.40batch/s] [A
Epoch  4/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:26,  1.41batch/s] [A
Epoch  4/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:20,  1.46batch/s] [A
Epoch  4/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:20,  1.46batch/s] , loss=1.3768, f1=0.345, lr=0.001000[A
Epoch  4/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:12,  1.55batch/s] , loss=1.3768, f1=0.345, lr=0.001000[A
Epoch  4/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:12,  1.54batch/s] , loss=1.3768, f1=0.345, lr=0.001000[A
Epoch  4/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:10,  1.56batch/s] , loss=1.3768, f1=0.345, lr=0.001000[A
Epoch  4/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:11,  1.54batch/s] , l

  4    |  178.8s |   0.001000 |      1.3548 |    0.2856 |      1.0892 |    0.3407 | ‚≠ê BEST (+0.0812)



Epoch  5/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  5/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:33,  1.36batch/s] [A
Epoch  5/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<03:07,  1.11batch/s] [A
Epoch  5/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:48,  1.23batch/s] [A
Epoch  5/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:37,  1.31batch/s] [A
Epoch  5/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:37,  1.31batch/s] , loss=1.3214, f1=0.367, lr=0.001000[A
Epoch  5/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:37,  1.30batch/s] , loss=1.3214, f1=0.367, lr=0.001000[A
Epoch  5/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:33,  1.33batch/s] , loss=1.3214, f1=0.367, lr=0.001000[A
Epoch  5/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:05<02:32,  1.33batch/s] , loss=1.3214, f1=0.367, lr=0.001000[A
Epoch  5/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:06<02:27,  1.37batch/s] , l

  5    |  152.4s |   0.001000 |      1.1490 |    0.3328 |      1.0085 |    0.3699 | ‚≠ê BEST (+0.0292)



Epoch  6/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  6/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:19,  1.49batch/s] [A
Epoch  6/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:12,  1.56batch/s] [A
Epoch  6/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:17,  1.51batch/s] [A
Epoch  6/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:22,  1.45batch/s] [A
Epoch  6/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:22,  1.45batch/s] , loss=1.1828, f1=0.358, lr=0.001000[A
Epoch  6/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:18,  1.48batch/s] , loss=1.1828, f1=0.358, lr=0.001000[A
Epoch  6/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:17,  1.49batch/s] , loss=1.1828, f1=0.358, lr=0.001000[A
Epoch  6/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:20,  1.44batch/s] , loss=1.1828, f1=0.358, lr=0.001000[A
Epoch  6/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:15,  1.49batch/s] , l

  6    |  149.7s |   0.001000 |      1.0040 |    0.3642 |      0.9115 |    0.3917 | ‚≠ê BEST (+0.0219)



Epoch  7/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  7/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:15,  1.54batch/s] [A
Epoch  7/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:13,  1.56batch/s] [A
Epoch  7/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:15,  1.53batch/s] [A
Epoch  7/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:19,  1.47batch/s] [A
Epoch  7/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:19,  1.47batch/s] , loss=0.7883, f1=0.472, lr=0.001000[A
Epoch  7/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:17,  1.49batch/s] , loss=0.7883, f1=0.472, lr=0.001000[A
Epoch  7/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:14,  1.51batch/s] , loss=0.7883, f1=0.472, lr=0.001000[A
Epoch  7/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:12,  1.53batch/s] , loss=0.7883, f1=0.472, lr=0.001000[A
Epoch  7/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:11,  1.54batch/s] , l

  7    |  150.9s |   0.001000 |      0.9113 |    0.3872 |      1.0111 |    0.3999 | ‚≠ê BEST (+0.0082)



Epoch  8/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  8/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:23,  1.46batch/s] [A
Epoch  8/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:23,  1.45batch/s] [A
Epoch  8/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:27,  1.41batch/s] [A
Epoch  8/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:21,  1.45batch/s] [A
Epoch  8/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:21,  1.45batch/s] , loss=0.8839, f1=0.489, lr=0.001000[A
Epoch  8/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:18,  1.48batch/s] , loss=0.8839, f1=0.489, lr=0.001000[A
Epoch  8/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:17,  1.48batch/s] , loss=0.8839, f1=0.489, lr=0.001000[A
Epoch  8/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:14,  1.51batch/s] , loss=0.8839, f1=0.489, lr=0.001000[A
Epoch  8/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:10,  1.54batch/s] , l

  8    |  151.2s |   0.001000 |      0.8051 |    0.4309 |      0.6950 |    0.4566 | ‚≠ê BEST (+0.0566)



Epoch  9/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch  9/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:34,  1.35batch/s] [A
Epoch  9/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:26,  1.42batch/s] [A
Epoch  9/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:22,  1.45batch/s] [A
Epoch  9/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:19,  1.48batch/s] [A
Epoch  9/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:19,  1.48batch/s] , loss=0.8430, f1=0.522, lr=0.001000[A
Epoch  9/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:16,  1.50batch/s] , loss=0.8430, f1=0.522, lr=0.001000[A
Epoch  9/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:13,  1.53batch/s] , loss=0.8430, f1=0.522, lr=0.001000[A
Epoch  9/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:12,  1.54batch/s] , loss=0.8430, f1=0.522, lr=0.001000[A
Epoch  9/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:11,  1.54batch/s] , l

  9    |  151.5s |   0.001000 |      0.7228 |    0.4510 |      0.7674 |    0.4672 | ‚≠ê BEST (+0.0107)



Epoch 10/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 10/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:24,  1.44batch/s] [A
Epoch 10/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:19,  1.49batch/s] [A
Epoch 10/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:19,  1.48batch/s] [A
Epoch 10/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:17,  1.50batch/s] [A
Epoch 10/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:17,  1.50batch/s] , loss=0.6671, f1=0.593, lr=0.001000[A
Epoch 10/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:17,  1.49batch/s] , loss=0.6671, f1=0.593, lr=0.001000[A
Epoch 10/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:13,  1.53batch/s] , loss=0.6671, f1=0.593, lr=0.001000[A
Epoch 10/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:07,  1.60batch/s] , loss=0.6671, f1=0.593, lr=0.001000[A
Epoch 10/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:07,  1.59batch/s] , l

  10   |  151.4s |   0.001000 |      0.6568 |    0.4926 |      0.5501 |    0.4927 | ‚≠ê BEST (+0.0255)



Epoch 11/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 11/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:25,  1.44batch/s] [A
Epoch 11/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:18,  1.50batch/s] [A
Epoch 11/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:15,  1.53batch/s] [A
Epoch 11/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:17,  1.50batch/s] [A
Epoch 11/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:17,  1.50batch/s] , loss=0.5622, f1=0.655, lr=0.001000[A
Epoch 11/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:14,  1.53batch/s] , loss=0.5622, f1=0.655, lr=0.001000[A
Epoch 11/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:13,  1.53batch/s] , loss=0.5622, f1=0.655, lr=0.001000[A
Epoch 11/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:11,  1.54batch/s] , loss=0.5622, f1=0.655, lr=0.001000[A
Epoch 11/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:10,  1.54batch/s] , l

  11   |  203.7s |   0.001000 |      0.5936 |    0.5317 |      0.5434 |    0.5612 | ‚≠ê BEST (+0.0684)



Epoch 12/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 12/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:01<03:53,  1.11s/batch] [A
Epoch 12/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:02<03:50,  1.11s/batch] [A
Epoch 12/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:03<03:49,  1.11s/batch] [A
Epoch 12/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:04<03:48,  1.11s/batch] [A
Epoch 12/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:05<03:48,  1.11s/batch] , loss=0.5843, f1=0.637, lr=0.001000[A
Epoch 12/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:05<03:47,  1.11s/batch] , loss=0.5843, f1=0.637, lr=0.001000[A
Epoch 12/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:06<03:45,  1.10s/batch] , loss=0.5843, f1=0.637, lr=0.001000[A
Epoch 12/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:07<03:43,  1.10s/batch] , loss=0.5843, f1=0.637, lr=0.001000[A
Epoch 12/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:08<03:42,  1.10s/batch] , l

  12   |  203.2s |   0.001000 |      0.5532 |    0.5525 |      0.5617 |    0.5568 |      (1/5)     



Epoch 13/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 13/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:33,  1.36batch/s] [A
Epoch 13/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<03:09,  1.10batch/s] [A
Epoch 13/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<03:02,  1.14batch/s] [A
Epoch 13/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:50,  1.21batch/s] [A
Epoch 13/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:04<02:50,  1.21batch/s] , loss=0.4318, f1=0.643, lr=0.001000[A
Epoch 13/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:04<02:43,  1.25batch/s] , loss=0.4318, f1=0.643, lr=0.001000[A
Epoch 13/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:33,  1.33batch/s] , loss=0.4318, f1=0.643, lr=0.001000[A
Epoch 13/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:05<02:30,  1.35batch/s] , loss=0.4318, f1=0.643, lr=0.001000[A
Epoch 13/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:06<02:29,  1.36batch/s] , l

  13   |  224.0s |   0.001000 |      0.5019 |    0.5783 |      0.6538 |    0.5605 |      (2/5)     



Epoch 14/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 14/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:29,  1.40batch/s] [A
Epoch 14/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<03:00,  1.15batch/s] [A
Epoch 14/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:53,  1.19batch/s] [A
Epoch 14/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:40,  1.29batch/s] [A
Epoch 14/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:40,  1.29batch/s] , loss=0.4884, f1=0.680, lr=0.001000[A
Epoch 14/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:29,  1.37batch/s] , loss=0.4884, f1=0.680, lr=0.001000[A
Epoch 14/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:22,  1.44batch/s] , loss=0.4884, f1=0.680, lr=0.001000[A
Epoch 14/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:05<02:19,  1.46batch/s] , loss=0.4884, f1=0.680, lr=0.001000[A
Epoch 14/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:17,  1.47batch/s] , l

  14   |  153.0s |   0.001000 |      0.4695 |    0.6174 |      0.5792 |    0.5835 | ‚≠ê BEST (+0.0223)



Epoch 15/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 15/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:22,  1.46batch/s] [A
Epoch 15/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:18,  1.50batch/s] [A
Epoch 15/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:15,  1.53batch/s] [A
Epoch 15/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:14,  1.53batch/s] [A
Epoch 15/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:14,  1.53batch/s] , loss=0.4216, f1=0.673, lr=0.001000[A
Epoch 15/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:13,  1.54batch/s] , loss=0.4216, f1=0.673, lr=0.001000[A
Epoch 15/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:11,  1.56batch/s] , loss=0.4216, f1=0.673, lr=0.001000[A
Epoch 15/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:10,  1.55batch/s] , loss=0.4216, f1=0.673, lr=0.001000[A
Epoch 15/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:09,  1.56batch/s] , l

  15   |  150.8s |   0.001000 |      0.4262 |    0.6309 |      0.4898 |    0.6084 | ‚≠ê BEST (+0.0248)



Epoch 16/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 16/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:23,  1.46batch/s] [A
Epoch 16/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:18,  1.50batch/s] [A
Epoch 16/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:15,  1.53batch/s] [A
Epoch 16/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:14,  1.53batch/s] [A
Epoch 16/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:14,  1.53batch/s] , loss=0.4101, f1=0.689, lr=0.001000[A
Epoch 16/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:13,  1.54batch/s] , loss=0.4101, f1=0.689, lr=0.001000[A
Epoch 16/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:11,  1.55batch/s] , loss=0.4101, f1=0.689, lr=0.001000[A
Epoch 16/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:07,  1.59batch/s] , loss=0.4101, f1=0.689, lr=0.001000[A
Epoch 16/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:03,  1.63batch/s] , l

  16   |  150.7s |   0.001000 |      0.4029 |    0.6675 |      0.4411 |    0.6715 | ‚≠ê BEST (+0.0632)



Epoch 17/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 17/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:23,  1.46batch/s] [A
Epoch 17/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:17,  1.51batch/s] [A
Epoch 17/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:22,  1.45batch/s] [A
Epoch 17/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:20,  1.47batch/s] [A
Epoch 17/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:20,  1.47batch/s] , loss=0.4059, f1=0.717, lr=0.001000[A
Epoch 17/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:14,  1.53batch/s] , loss=0.4059, f1=0.717, lr=0.001000[A
Epoch 17/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:12,  1.54batch/s] , loss=0.4059, f1=0.717, lr=0.001000[A
Epoch 17/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:06,  1.60batch/s] , loss=0.4059, f1=0.717, lr=0.001000[A
Epoch 17/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:08,  1.58batch/s] , l

  17   |  168.1s |   0.001000 |      0.3716 |    0.6896 |      0.3550 |    0.6918 | ‚≠ê BEST (+0.0203)



Epoch 18/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 18/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:01<03:52,  1.11s/batch] [A
Epoch 18/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:02<03:51,  1.11s/batch] [A
Epoch 18/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:03<03:48,  1.10s/batch] [A
Epoch 18/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:05<05:01,  1.46s/batch] [A
Epoch 18/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:06<05:01,  1.46s/batch] , loss=0.2780, f1=0.729, lr=0.001000[A
Epoch 18/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:06<04:34,  1.34s/batch] , loss=0.2780, f1=0.729, lr=0.001000[A
Epoch 18/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:07<04:16,  1.26s/batch] , loss=0.2780, f1=0.729, lr=0.001000[A
Epoch 18/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:08<04:05,  1.21s/batch] , loss=0.2780, f1=0.729, lr=0.001000[A
Epoch 18/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:09<03:58,  1.18s/batch] , l

  18   |  218.8s |   0.001000 |      0.3528 |    0.7140 |      0.3372 |    0.6957 | ‚≠ê BEST (+0.0039)



Epoch 19/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 19/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:16,  1.53batch/s] [A
Epoch 19/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:28,  1.40batch/s] [A
Epoch 19/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:24,  1.44batch/s] [A
Epoch 19/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:20,  1.47batch/s] [A
Epoch 19/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:20,  1.47batch/s] , loss=0.2885, f1=0.723, lr=0.001000[A
Epoch 19/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:17,  1.50batch/s] , loss=0.2885, f1=0.723, lr=0.001000[A
Epoch 19/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:11,  1.56batch/s] , loss=0.2885, f1=0.723, lr=0.001000[A
Epoch 19/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:09,  1.57batch/s] , loss=0.2885, f1=0.723, lr=0.001000[A
Epoch 19/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:08,  1.57batch/s] , l

  19   |  151.7s |   0.001000 |      0.3321 |    0.7245 |      0.3033 |    0.7511 | ‚≠ê BEST (+0.0554)



Epoch 20/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 20/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:25,  1.44batch/s] [A
Epoch 20/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:19,  1.49batch/s] [A
Epoch 20/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:17,  1.51batch/s] [A
Epoch 20/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:14,  1.53batch/s] [A
Epoch 20/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:14,  1.53batch/s] , loss=0.2892, f1=0.771, lr=0.001000[A
Epoch 20/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:14,  1.53batch/s] , loss=0.2892, f1=0.771, lr=0.001000[A
Epoch 20/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:12,  1.54batch/s] , loss=0.2892, f1=0.771, lr=0.001000[A
Epoch 20/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:09,  1.57batch/s] , loss=0.2892, f1=0.771, lr=0.001000[A
Epoch 20/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:09,  1.56batch/s] , l

  20   |  154.0s |   0.001000 |      0.3096 |    0.7547 |      0.2996 |    0.7389 |      (1/5)     



Epoch 21/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 21/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:22,  1.47batch/s] [A
Epoch 21/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:25,  1.43batch/s] [A
Epoch 21/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:19,  1.48batch/s] [A
Epoch 21/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:14,  1.53batch/s] [A
Epoch 21/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:14,  1.53batch/s] , loss=0.2621, f1=0.864, lr=0.001000[A
Epoch 21/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:18,  1.48batch/s] , loss=0.2621, f1=0.864, lr=0.001000[A
Epoch 21/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:16,  1.50batch/s] , loss=0.2621, f1=0.864, lr=0.001000[A
Epoch 21/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:21,  1.43batch/s] , loss=0.2621, f1=0.864, lr=0.001000[A
Epoch 21/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:21,  1.43batch/s] , l

  21   |  152.7s |   0.001000 |      0.2857 |    0.7639 |      0.4131 |    0.6935 |      (2/5)     



Epoch 22/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 22/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:32,  1.37batch/s] [A
Epoch 22/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:33,  1.36batch/s] [A
Epoch 22/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:24,  1.43batch/s] [A
Epoch 22/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:17,  1.50batch/s] [A
Epoch 22/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:17,  1.50batch/s] , loss=0.2926, f1=0.785, lr=0.001000[A
Epoch 22/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:19,  1.47batch/s] , loss=0.2926, f1=0.785, lr=0.001000[A
Epoch 22/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:14,  1.52batch/s] , loss=0.2926, f1=0.785, lr=0.001000[A
Epoch 22/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:10,  1.56batch/s] , loss=0.2926, f1=0.785, lr=0.001000[A
Epoch 22/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:11,  1.54batch/s] , l

  22   |  152.3s |   0.001000 |      0.2715 |    0.7811 |      0.3091 |    0.7660 | ‚≠ê BEST (+0.0149)



Epoch 23/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 23/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:34,  1.36batch/s] [A
Epoch 23/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:16,  1.52batch/s] [A
Epoch 23/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:15,  1.53batch/s] [A
Epoch 23/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:13,  1.54batch/s] [A
Epoch 23/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:13,  1.54batch/s] , loss=0.2267, f1=0.835, lr=0.001000[A
Epoch 23/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:13,  1.54batch/s] , loss=0.2267, f1=0.835, lr=0.001000[A
Epoch 23/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:10,  1.56batch/s] , loss=0.2267, f1=0.835, lr=0.001000[A
Epoch 23/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:13,  1.52batch/s] , loss=0.2267, f1=0.835, lr=0.001000[A
Epoch 23/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:08,  1.58batch/s] , l

  23   |  151.7s |   0.001000 |      0.2617 |    0.7898 |      0.3080 |    0.7648 |      (1/5)     



Epoch 24/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 24/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:25,  1.44batch/s] [A
Epoch 24/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:15,  1.54batch/s] [A
Epoch 24/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:01<02:15,  1.53batch/s] [A
Epoch 24/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:13,  1.54batch/s] [A
Epoch 24/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:13,  1.54batch/s] , loss=0.2284, f1=0.736, lr=0.001000[A
Epoch 24/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:13,  1.54batch/s] , loss=0.2284, f1=0.736, lr=0.001000[A
Epoch 24/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:03<02:12,  1.54batch/s] , loss=0.2284, f1=0.736, lr=0.001000[A
Epoch 24/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:11,  1.55batch/s] , loss=0.2284, f1=0.736, lr=0.001000[A
Epoch 24/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:10,  1.54batch/s] , l

  24   |  153.9s |   0.001000 |      0.2468 |    0.7843 |      0.2719 |    0.7775 | ‚≠ê BEST (+0.0115)



Epoch 25/25 [TRAIN]:   0%|[34m          [0m| 0/210 [00:00<?, ?batch/s] [A
Epoch 25/25 [TRAIN]:   0%|[34m          [0m| 1/210 [00:00<02:27,  1.42batch/s] [A
Epoch 25/25 [TRAIN]:   1%|[34m          [0m| 2/210 [00:01<02:23,  1.45batch/s] [A
Epoch 25/25 [TRAIN]:   1%|[34m‚ñè         [0m| 3/210 [00:02<02:24,  1.43batch/s] [A
Epoch 25/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:02<02:18,  1.49batch/s] [A
Epoch 25/25 [TRAIN]:   2%|[34m‚ñè         [0m| 4/210 [00:03<02:18,  1.49batch/s] , loss=0.1810, f1=0.833, lr=0.001000[A
Epoch 25/25 [TRAIN]:   2%|[34m‚ñè         [0m| 5/210 [00:03<02:15,  1.51batch/s] , loss=0.1810, f1=0.833, lr=0.001000[A
Epoch 25/25 [TRAIN]:   3%|[34m‚ñé         [0m| 6/210 [00:04<02:15,  1.51batch/s] , loss=0.1810, f1=0.833, lr=0.001000[A
Epoch 25/25 [TRAIN]:   3%|[34m‚ñé         [0m| 7/210 [00:04<02:13,  1.52batch/s] , loss=0.1810, f1=0.833, lr=0.001000[A
Epoch 25/25 [TRAIN]:   4%|[34m‚ñç         [0m| 8/210 [00:05<02:14,  1.50batch/s] , l

  25   |  152.7s |   0.001000 |      0.2375 |    0.8018 |      0.3170 |    0.7056 |      (1/5)     
--------------------------------------------------------------------------------

‚úÖ TRAINING COMPLETE

‚è±Ô∏è Time Statistics:
  Total time.............. 68.62 minutes (4117s)
  Average per epoch....... 164.68s
  Fastest epoch........... 149.67s
  Slowest epoch........... 224.04s

üìà Performance:
  Best Validation F1...... 0.7775 (Epoch 24)
  Final Train F1.......... 0.8018
  Final Val F1............ 0.7056
  Initial Val F1.......... 0.1367
  Total Improvement....... +0.5688

üíæ Output Files:
  Model checkpoint........ model.pth
  Training history........ training_history.json

üéâ SUCCESS! Model is ready for inference.



