# Verification task or Siamese neural networks training

This notebook presents the paper ["Siamese Neural Networks for One-shot Image Recognition"](https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) coded with PyTorch framework. 

In this part we train Siamese network on the Omniglot dataset to perform the classification task to distinguish two images of the same class or different classes.


References:
- [paper](https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf)
- [omniglot](https://github.com/brendenlake/omniglot)
- [keras-oneshot](https://github.com/sorenbouma/keras-oneshot)


In [1]:
# https://ipython.org/ipython-doc/3/config/extensions/autoreload.html
%load_ext autoreload
%autoreload 2

In [2]:
import os, sys
import numpy as np
import cv2

In [3]:
sys.path.append("..")

In [4]:
HAS_GPU = True

## Setup dataflow

In [5]:
from dataflow import OmniglotDataset, SameOrDifferentPairsDataset, PairTransformedDataset
from common_utils.imgaug import RandomAffine, RandomApply
from common_utils.dataflow import TransformedDataset, OnGPUDataLoader, ResizedDataset
from torchvision.transforms import Compose, ToTensor, Normalize
from torch.utils.data import DataLoader
import torch

In [6]:
np.random.seed(12345)

OMNIGLOT_REPO_PATH='omniglot'

TRAIN_DATA_PATH = os.path.join(OMNIGLOT_REPO_PATH, 'python', 'images_background')
train_alphabets = !ls {TRAIN_DATA_PATH}
train_alphabets = list(train_alphabets)

TEST_DATA_PATH = os.path.join(OMNIGLOT_REPO_PATH, 'python', 'images_evaluation')
test_alphabets = !ls {TEST_DATA_PATH}
test_alphabets = list(test_alphabets)

assert len(train_alphabets) > 1 and len(test_alphabets) > 1, "%s \n %s" % (train_alphabets[0], test_alphabets[0])

train_alphabet_char_id_drawer_ids = {}
for a in train_alphabets:
    res = !ls "{os.path.join(TRAIN_DATA_PATH, a)}"
    char_ids = list(res)
    train_alphabet_char_id_drawer_ids[a] = {}
    for char_id in char_ids:
        res = !ls "{os.path.join(TRAIN_DATA_PATH, a, char_id)}"
        train_alphabet_char_id_drawer_ids[a][char_id] = [_id[:-4] for _id in list(res)]
        
        
test_alphabet_char_id_drawer_ids = {}
for a in test_alphabets:
    res = !ls "{os.path.join(TEST_DATA_PATH, a)}"
    char_ids = list(res)
    test_alphabet_char_id_drawer_ids[a] = {}
    for char_id in char_ids:
        res = !ls "{os.path.join(TEST_DATA_PATH, a, char_id)}"
        test_alphabet_char_id_drawer_ids[a][char_id] = [_id[:-4] for _id in list(res)]


# Sample 12 drawers out of 20
all_drawers_ids = np.arange(20) 
train_drawers_ids = np.random.choice(all_drawers_ids, size=12, replace=False)
# Sample 4 drawers out of remaining 8
val_drawers_ids = np.random.choice(list(set(all_drawers_ids) - set(train_drawers_ids)), size=4, replace=False)
test_drawers_ids = np.array(list(set(all_drawers_ids) - set(val_drawers_ids) - set(train_drawers_ids)))

def create_str_drawers_ids(drawers_ids):
    return ["_{0:0>2}".format(_id) for _id in drawers_ids]

train_drawers_ids = create_str_drawers_ids(train_drawers_ids)
val_drawers_ids = create_str_drawers_ids(val_drawers_ids)
test_drawers_ids = create_str_drawers_ids(test_drawers_ids)

train_ds = OmniglotDataset("Train", data_path=TRAIN_DATA_PATH, 
                           alphabet_char_id_drawers_ids=train_alphabet_char_id_drawer_ids, 
                           drawers_ids=train_drawers_ids)

val_ds = OmniglotDataset("Test", data_path=TEST_DATA_PATH, 
                         alphabet_char_id_drawers_ids=test_alphabet_char_id_drawer_ids, 
                         drawers_ids=val_drawers_ids)

test_ds = OmniglotDataset("Test", data_path=TEST_DATA_PATH, 
                          alphabet_char_id_drawers_ids=test_alphabet_char_id_drawer_ids, 
                          drawers_ids=test_drawers_ids)

#train_ds = ResizedDataset(train_ds, output_size=(80, 80))
#val_ds = ResizedDataset(val_ds, output_size=(80, 80))
#test_ds = ResizedDataset(test_ds, output_size=(80, 80))

In [7]:
train_pairs = SameOrDifferentPairsDataset(train_ds, nb_pairs=int(30e3))
val_pairs = SameOrDifferentPairsDataset(val_ds, nb_pairs=int(10e3))
test_pairs = SameOrDifferentPairsDataset(test_ds, nb_pairs=int(10e3))

len(train_pairs), len(val_pairs), len(test_pairs)

(30000, 10000, 10000)

In [8]:
train_data_aug = Compose([
    RandomApply(
        RandomAffine(rotation=(-10, 10), scale=(0.8, 1.2), translate=(-0.05, 0.05)),
        proba=0.5
    ),
    ToTensor()
])

test_data_aug = Compose([
    ToTensor()
])

y_transform = lambda y: torch.FloatTensor([y])

train_aug_pairs = PairTransformedDataset(train_pairs, x_transforms=train_data_aug, y_transforms=y_transform)
val_aug_pairs = PairTransformedDataset(val_pairs, x_transforms=test_data_aug, y_transforms=y_transform)
test_aug_pairs = PairTransformedDataset(test_pairs, x_transforms=test_data_aug, y_transforms=y_transform)

In [67]:
batch_size = 64

_DataLoader = OnGPUDataLoader if HAS_GPU and torch.cuda.is_available() else DataLoader

train_batches = _DataLoader(train_aug_pairs, batch_size=batch_size, 
                            shuffle=True, num_workers=12, 
                            drop_last=True)

val_batches = _DataLoader(val_aug_pairs, batch_size=batch_size, 
                          shuffle=True, num_workers=12,
                          pin_memory=True, drop_last=True)

test_batches = _DataLoader(test_aug_pairs, batch_size=batch_size, 
                           shuffle=False, num_workers=12,                   
                           pin_memory=True, drop_last=False)


len(train_batches), len(val_batches), len(test_batches)

(468, 156, 157)

In [68]:
for (x1, x2), y in train_batches:
    print(x1.size(), x2.size(), y.size())
    print(type(x1), type(x1), type(y))    
    break

torch.Size([64, 1, 105, 105]) torch.Size([64, 1, 105, 105]) torch.Size([64, 1])
<class 'torch.cuda.FloatTensor'> <class 'torch.cuda.FloatTensor'> <class 'torch.cuda.FloatTensor'>


## Setup model, loss function and optimisation algorithm

#### Weight regularization

L2 weights regularization: 

#### Loss function

Binary cross-entropy

In [91]:
from torch.autograd import Variable
from torch.nn import BCEWithLogitsLoss
from torch.nn.functional import sigmoid
from torch.optim import Adam, RMSprop, SGD
from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau

In [92]:
from datetime import datetime
from common_utils.training_utils import train_one_epoch, validate, write_csv_log, write_conf_log, verbose_optimizer, save_checkpoint
from common_utils.training_utils import accuracy

In [93]:
from model import SiameseNetworks

In [121]:
siamese_net = SiameseNetworks(input_shape=(105, 105, 1))
if HAS_GPU and torch.cuda.is_available():
    siamese_net = siamese_net.cuda()

In [122]:
conf = {
    'weight_decay': 0.01,
    
    'lr_features': 0.00006,
    'lr_classifier': 0.00008,
    
    'n_epochs': 50,    
    'gamma': 0.77
}

In [123]:
def accuracy_logits(y_logits, y_true):
    y_pred = sigmoid(y_logits).data
    return accuracy(y_pred, y_true)

In [124]:
criterion = BCEWithLogitsLoss()
if HAS_GPU and torch.cuda.is_available():
    criterion = criterion.cuda()

In [125]:
# Test single forward pass and loss function computation
siamese_net.eval()
for i, ((batch_x1, batch_x2), batch_y) in enumerate(train_batches):
    
    batch_x1 = Variable(batch_x1, requires_grad=True)
    batch_x2 = Variable(batch_x2, requires_grad=True)    
    batch_y = Variable(batch_y)
    batch_y_logits = siamese_net(batch_x1, batch_x2)
    print(type(batch_y.data), type(batch_y_logits.data), batch_y.size(), batch_y_logits.size())    
    loss = criterion(batch_y_logits, batch_y)
    print("Loss : ", loss.data)
    
    print("Accuracy : ", accuracy_logits(batch_y_logits.data, batch_y.data))
    break

<class 'torch.cuda.FloatTensor'> <class 'torch.cuda.FloatTensor'> torch.Size([64, 1]) torch.Size([64, 1])
Loss :  
 0.6933
[torch.cuda.FloatTensor of size 1 (GPU 0)]

Accuracy :  0.53125


In [126]:
optimizer = Adam([{
    'params': siamese_net.net.features.parameters(),
    'lr': conf['lr_features'],    
}, {
    'params': siamese_net.classifier.parameters(),
    'lr': conf['lr_classifier']
}],
    weight_decay=conf['weight_decay']
)

Note that we define L2 regularization weights through optimizer API as `weight_decay` parameter, [ref](http://pytorch.org/docs/master/optim.html?highlight=adam#torch.optim.Adam)

In [127]:
# lr <- lr_init * gamma ** epoch
scheduler = ExponentialLR(optimizer, gamma=conf['gamma'])
onplateau_scheduler = ReduceLROnPlateau(optimizer, factor=0.5, patience=2, verbose=True)

### Start training

In [128]:
now = datetime.now()
logs_path = os.path.join('logs', 'seamese_networks_verification_task_%s' % (now.strftime("%Y%m%d_%H%M")))
if not os.path.exists(logs_path):
    os.makedirs(logs_path)

In [129]:
write_conf_log(logs_path, "{}".format(conf))
write_conf_log(logs_path, verbose_optimizer(optimizer))

write_csv_log(logs_path, "epoch,train_loss,train_acc,val_loss,val_acc")

best_acc = 0.0
for epoch in range(conf['n_epochs']):
    scheduler.step()
    # Verbose learning rates:
    print(verbose_optimizer(optimizer))

    # train for one epoch
    ret = train_one_epoch(siamese_net, train_batches, 
                          criterion, optimizer,                                               
                          epoch, conf['n_epochs'], avg_metrics=[accuracy_logits,])
    if ret is None:
        break
    train_loss, train_acc = ret

    # evaluate on validation set
    ret = validate(siamese_net, val_batches, criterion, avg_metrics=[accuracy_logits, ])
    if ret is None:
        break
    val_loss, val_acc = ret
    
    onplateau_scheduler.step(val_loss)

    # Write a csv log file
    write_csv_log(logs_path, "%i,%f,%f,%f,%f" % (epoch, train_loss, train_acc, val_loss, val_acc))

    # remember best accuracy and save checkpoint
    if val_acc > best_acc:
        best_prec1 = max(val_acc, best_acc)
        save_checkpoint(logs_path, 'val_acc', 
                        {'epoch': epoch + 1,
                         'state_dict': siamese_net.state_dict(),
                         'val_acc': val_acc,           
                         'optimizer': optimizer.state_dict()})        

  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 6e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 8e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 1/50: 100%|##########| 468/468 [00:42<00:00, 11.02it/s, Loss 0.6317 | accuracy_logits 0.567]
100%|##########| 156/156 [00:04<00:00, 34.97it/s, Loss 0.6219 | accuracy_logits 0.611]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 4.6200000000000005e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 6.16e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 2/50: 100%|##########| 468/468 [00:43<00:00, 10.88it/s, Loss 0.5921 | accuracy_logits 0.654]
100%|##########| 156/156 [00:04<00:00, 34.55it/s, Loss 0.5797 | accuracy_logits 0.694]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 3.5574e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 4.7432e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 3/50: 100%|##########| 468/468 [00:43<00:00, 10.78it/s, Loss 0.5429 | accuracy_logits 0.721]
100%|##########| 156/156 [00:04<00:00, 34.51it/s, Loss 0.5546 | accuracy_logits 0.705]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 2.7391980000000003e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 3.6522640000000004e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 4/50: 100%|##########| 468/468 [00:43<00:00, 10.87it/s, Loss 0.4975 | accuracy_logits 0.762]
100%|##########| 156/156 [00:04<00:00, 34.46it/s, Loss 0.5334 | accuracy_logits 0.725]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 2.1091824600000002e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 2.8122432800000004e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 5/50: 100%|##########| 468/468 [00:43<00:00, 10.86it/s, Loss 0.4611 | accuracy_logits 0.791]
100%|##########| 156/156 [00:04<00:00, 34.32it/s, Loss 0.5067 | accuracy_logits 0.756]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.6240704942e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 2.1654273256e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 6/50: 100%|##########| 468/468 [00:43<00:00, 10.86it/s, Loss 0.4326 | accuracy_logits 0.808]
100%|##########| 156/156 [00:04<00:00, 34.82it/s, Loss 0.4967 | accuracy_logits 0.771]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.2505342805340002e-05
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.667379040712e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 7/50: 100%|##########| 468/468 [00:43<00:00, 10.85it/s, Loss 0.4099 | accuracy_logits 0.826]
100%|##########| 156/156 [00:04<00:00, 34.85it/s, Loss 0.4960 | accuracy_logits 0.765]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 9.629113960111801e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.2838818613482402e-05
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 8/50: 100%|##########| 468/468 [00:43<00:00, 10.79it/s, Loss 0.3901 | accuracy_logits 0.838]
100%|##########| 156/156 [00:04<00:00, 35.14it/s, Loss 0.4937 | accuracy_logits 0.773]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 7.4144177492860875e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 9.88589033238145e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 9/50: 100%|##########| 468/468 [00:43<00:00, 10.79it/s, Loss 0.3804 | accuracy_logits 0.844]
100%|##########| 156/156 [00:04<00:00, 34.35it/s, Loss 0.4850 | accuracy_logits 0.780]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 5.709101666950288e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 7.612135555933717e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 10/50: 100%|##########| 468/468 [00:43<00:00, 10.75it/s, Loss 0.3695 | accuracy_logits 0.850]
100%|##########| 156/156 [00:04<00:00, 35.15it/s, Loss 0.4825 | accuracy_logits 0.777]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 4.396008283551722e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 5.861344378068963e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 11/50: 100%|##########| 468/468 [00:43<00:00, 10.75it/s, Loss 0.3641 | accuracy_logits 0.852]
100%|##########| 156/156 [00:04<00:00, 34.63it/s, Loss 0.4753 | accuracy_logits 0.778]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 3.3849263783348255e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 4.513235171113101e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 12/50: 100%|##########| 468/468 [00:43<00:00, 10.66it/s, Loss 0.3542 | accuracy_logits 0.858]
100%|##########| 156/156 [00:04<00:00, 34.72it/s, Loss 0.4724 | accuracy_logits 0.782]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 2.606393311317816e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 3.4751910817570877e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 13/50: 100%|##########| 468/468 [00:43<00:00, 10.66it/s, Loss 0.3524 | accuracy_logits 0.860]
100%|##########| 156/156 [00:04<00:00, 34.67it/s, Loss 0.4805 | accuracy_logits 0.779]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 2.006922849714718e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 2.6758971329529576e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 14/50: 100%|##########| 468/468 [00:44<00:00, 10.61it/s, Loss 0.3499 | accuracy_logits 0.860]
100%|##########| 156/156 [00:04<00:00, 35.05it/s, Loss 0.4833 | accuracy_logits 0.783]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.5453305942803332e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 2.060440792373778e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 15/50: 100%|##########| 468/468 [00:44<00:00, 10.63it/s, Loss 0.3473 | accuracy_logits 0.862]
100%|##########| 156/156 [00:04<00:00, 35.21it/s, Loss 0.4755 | accuracy_logits 0.784]
  0%|          | 0/468 [00:00<?, ?it/s]

Epoch    14: reducing learning rate of group 0 to 7.7267e-07.
Epoch    14: reducing learning rate of group 1 to 1.0302e-06.

Optimizer: Adam
- Param group: 
	lr: 1.1899045575958565e-06
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.5865394101278086e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 16/50: 100%|##########| 468/468 [00:44<00:00, 10.60it/s, Loss 0.3429 | accuracy_logits 0.864]
100%|##########| 156/156 [00:04<00:00, 34.81it/s, Loss 0.4745 | accuracy_logits 0.785]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 9.162265093488095e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.2216353457984127e-06
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 17/50: 100%|##########| 468/468 [00:44<00:00, 10.59it/s, Loss 0.3447 | accuracy_logits 0.864]
100%|##########| 156/156 [00:04<00:00, 35.47it/s, Loss 0.4742 | accuracy_logits 0.784]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 7.054944121985834e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 9.406592162647779e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 18/50: 100%|##########| 468/468 [00:44<00:00, 10.59it/s, Loss 0.3428 | accuracy_logits 0.865]
100%|##########| 156/156 [00:04<00:00, 34.88it/s, Loss 0.4719 | accuracy_logits 0.785]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 5.432306973929092e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 7.24307596523879e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 19/50: 100%|##########| 468/468 [00:44<00:00, 10.62it/s, Loss 0.3424 | accuracy_logits 0.866]
100%|##########| 156/156 [00:04<00:00, 34.90it/s, Loss 0.4733 | accuracy_logits 0.786]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 4.1828763699254005e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 5.577168493233868e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 20/50: 100%|##########| 468/468 [00:44<00:00, 10.59it/s, Loss 0.3410 | accuracy_logits 0.866]
100%|##########| 156/156 [00:04<00:00, 34.37it/s, Loss 0.4732 | accuracy_logits 0.786]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 3.220814804842559e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 4.294419739790079e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 21/50: 100%|##########| 468/468 [00:44<00:00, 10.58it/s, Loss 0.3380 | accuracy_logits 0.868]
100%|##########| 156/156 [00:04<00:00, 35.26it/s, Loss 0.4708 | accuracy_logits 0.788]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 2.4800273997287704e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 3.3067031996383607e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 22/50: 100%|##########| 468/468 [00:44<00:00, 10.60it/s, Loss 0.3414 | accuracy_logits 0.864]
100%|##########| 156/156 [00:04<00:00, 34.72it/s, Loss 0.4718 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.9096210977911532e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 2.5461614637215375e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 23/50: 100%|##########| 468/468 [00:44<00:00, 10.57it/s, Loss 0.3377 | accuracy_logits 0.868]
100%|##########| 156/156 [00:04<00:00, 34.74it/s, Loss 0.4735 | accuracy_logits 0.786]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.470408245299188e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.960544327065584e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 24/50: 100%|##########| 468/468 [00:44<00:00, 10.58it/s, Loss 0.3383 | accuracy_logits 0.868]
100%|##########| 156/156 [00:04<00:00, 34.84it/s, Loss 0.4723 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]

Epoch    23: reducing learning rate of group 0 to 7.3520e-08.
Epoch    23: reducing learning rate of group 1 to 9.8027e-08.

Optimizer: Adam
- Param group: 
	lr: 1.1322143488803749e-07
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.5096191318405e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 25/50: 100%|##########| 468/468 [00:44<00:00, 10.58it/s, Loss 0.3389 | accuracy_logits 0.866]
100%|##########| 156/156 [00:04<00:00, 34.81it/s, Loss 0.4732 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 8.718050486378885e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.1624067315171848e-07
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 26/50: 100%|##########| 468/468 [00:44<00:00, 10.56it/s, Loss 0.3371 | accuracy_logits 0.869]
100%|##########| 156/156 [00:04<00:00, 34.81it/s, Loss 0.4727 | accuracy_logits 0.788]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 6.712898874511743e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 8.950531832682324e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 27/50: 100%|##########| 468/468 [00:44<00:00, 10.55it/s, Loss 0.3358 | accuracy_logits 0.870]
100%|##########| 156/156 [00:04<00:00, 34.83it/s, Loss 0.4729 | accuracy_logits 0.787]


Epoch    26: reducing learning rate of group 0 to 3.3564e-08.
Epoch    26: reducing learning rate of group 1 to 4.4753e-08.


  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 5.168932133374042e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 6.89190951116539e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 28/50: 100%|##########| 468/468 [00:44<00:00, 10.59it/s, Loss 0.3351 | accuracy_logits 0.870]
100%|##########| 156/156 [00:04<00:00, 34.53it/s, Loss 0.4727 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 3.980077742698012e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 5.3067703235973496e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 29/50: 100%|##########| 468/468 [00:44<00:00, 10.62it/s, Loss 0.3391 | accuracy_logits 0.866]
100%|##########| 156/156 [00:04<00:00, 35.23it/s, Loss 0.4731 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 3.0646598618774695e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 4.08621314916996e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 30/50: 100%|##########| 468/468 [00:44<00:00, 10.59it/s, Loss 0.3370 | accuracy_logits 0.868]
100%|##########| 156/156 [00:04<00:00, 34.91it/s, Loss 0.4734 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]

Epoch    29: reducing learning rate of group 0 to 1.5323e-08.
Epoch    29: reducing learning rate of group 1 to 2.0431e-08.

Optimizer: Adam
- Param group: 
	lr: 2.3597880936456514e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 3.1463841248608685e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 31/50: 100%|##########| 468/468 [00:44<00:00, 10.60it/s, Loss 0.3361 | accuracy_logits 0.871]
100%|##########| 156/156 [00:04<00:00, 35.22it/s, Loss 0.4733 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.8170368321071516e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 2.422715776142869e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 32/50: 100%|##########| 468/468 [00:44<00:00, 10.60it/s, Loss 0.3391 | accuracy_logits 0.867]
100%|##########| 156/156 [00:04<00:00, 34.64it/s, Loss 0.4731 | accuracy_logits 0.787]
  0%|          | 0/468 [00:00<?, ?it/s]


Optimizer: Adam
- Param group: 
	lr: 1.3991183607225069e-08
	betas: (0.9, 0.999)
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
- Param group: 
	lr: 1.865491147630009e-08
	betas: (0.9, 0.999)
	initial_lr: 8e-05
	weight_decay: 0.01
	eps: 1e-08



Epoch: 33/50:  73%|#######2  | 340/468 [00:32<00:12, 10.66it/s, Loss 0.3374 | accuracy_logits 0.868]Process Process-3272:
Process Process-3281:
Process Process-3283:
Process Process-3276:
Process Process-3275:
Process Process-3277:
Process Process-3273:
Process Process-3278:
Process Process-3279:
Process Process-3274:
Traceback (most recent call last):
Traceback (most recent call last):
Process Process-3280:
Traceback (most recent call last):
Traceback (most recent call last):
Process Process-3282:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 34, in _worker_loop
    r = index_queue.get()
  File "/usr/lib/pytho

KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
Epoch: 33/50:  73%|#######2  | 341/468 [00:32<00:12, 10.53it/s, Loss 0.3373 | accuracy_logits 0.868]


### Inference on testing dataset

In [130]:
from common_utils.training_utils import load_checkpoint
from glob import glob

In [131]:
best_model_filenames = glob(os.path.join(logs_path, "model_val_acc=*"))
assert len(best_model_filenames) == 1
load_checkpoint(best_model_filenames[0], siamese_net)

Load checkpoint: logs/seamese_networks_verification_task_20171123_2101/model_val_acc=0.7871.pth.tar


In [132]:
# evaluate on validation set
test_loss, test_acc = validate(siamese_net, test_batches, criterion, avg_metrics=[accuracy_logits, ])
test_loss, test_acc

100%|##########| 157/157 [00:04<00:00, 34.73it/s, Loss 0.4653 | accuracy_logits 0.784]


(0.46532357335090635, 0.7845)

### Run training script

In [None]:
!python3 train_verification_task.py


Optimizer: Adam
- Param group: 
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 6e-05
	betas: (0.9, 0.999)
- Param group: 
	initial_lr: 7e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 7e-05
	betas: (0.9, 0.999)

Epoch: 1/50: 100%|#| 468/468 [00:43<00:00, 10.80it/s, Loss 0.6218 | accuracy_logits 0.588]
100%|####| 156/156 [00:04<00:00, 34.25it/s, Loss 0.6078 | accuracy_logits 0.630]

Optimizer: Adam
- Param group: 
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 4.7340000000000004e-05
	betas: (0.9, 0.999)
- Param group: 
	initial_lr: 7e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 5.523e-05
	betas: (0.9, 0.999)

Epoch: 2/50: 100%|#| 468/468 [00:43<00:00, 10.80it/s, Loss 0.5789 | accuracy_logits 0.680]
100%|####| 156/156 [00:04<00:00, 34.74it/s, Loss 0.5836 | accuracy_logits 0.677]

Optimizer: Adam
- Param group: 
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 3.735126000000001e-05
	betas: (0.9, 0.999)
- Param group: 
	initial_lr: 7e-05
	weight_decay: 0.01
	eps: 1e-08
	lr:

Epoch: 20/50: 100%|#| 468/468 [00:44<00:00, 10.61it/s, Loss 0.2982 | accuracy_logits 0.889]
100%|####| 156/156 [00:04<00:00, 34.84it/s, Loss 0.4481 | accuracy_logits 0.795]
Epoch    19: reducing learning rate of group 0 to 3.3234e-07.
Epoch    19: reducing learning rate of group 1 to 3.8773e-07.

Optimizer: Adam
- Param group: 
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 5.2443457304313e-07
	betas: (0.9, 0.999)
- Param group: 
	initial_lr: 7e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 6.11840335216985e-07
	betas: (0.9, 0.999)

Epoch: 21/50: 100%|#| 468/468 [00:44<00:00, 10.62it/s, Loss 0.2952 | accuracy_logits 0.890]
100%|####| 156/156 [00:04<00:00, 35.06it/s, Loss 0.4475 | accuracy_logits 0.795]

Optimizer: Adam
- Param group: 
	initial_lr: 6e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 4.137788781310296e-07
	betas: (0.9, 0.999)
- Param group: 
	initial_lr: 7e-05
	weight_decay: 0.01
	eps: 1e-08
	lr: 4.827420244862011e-07
	betas: (0.9, 0.999)

Epoch: 22/50: 100%|#| 468/468 [00:44<