# testing on wellcome images
We can now test our models' performance when transferred onto the Wellcome images dataset. In doing so, we'll get a better understanding of how well they generalise and which gaps in their knowledge we'll need to fill as we continue to modify them.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (20, 20)

import os
import io
import requests
import numpy as np
import pandas as pd
from PIL import Image
from scipy.spatial.distance import cdist
from scipy.io import loadmat
from bs4 import BeautifulSoup

import torch
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms

from tqdm._tqdm_notebook import tqdm_notebook as tqdm
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# load word vectors
Although we can get rid of our wordnet files in this inference-only section, we still need to create the word-vector space in which we'll perform searches. This will allow us to search for words which didn't appear in the original training set of word-photo pairs. 

In [None]:
wv_path = '/mnt/efs/nlp/word_vectors/fasttext/crawl-300d-2M.vec'
wv_file = io.open(wv_path, 'r', encoding='utf-8', newline='\n', errors='ignore')

word_vectors = {line.split()[0]: np.array(line.split()[1:]).astype(np.float)
                for line in tqdm(list(wv_file))}

# model
We'll now build our model, with the exact same configuration as in the last few notebooks. That model will then be initialised with random weights before injecting the pre-trained weights from the previous notebooks. As we've used the same model throughout, all we need to do is modify the path to the `.pt` weights file.

In [None]:
backbone = models.vgg16_bn(pretrained=True).features

In [None]:
for param in backbone.parameters():
    param.requires_grad = False

In [None]:
class DeViSE(nn.Module):
    def __init__(self, backbone, target_size=300):
        super(DeViSE, self).__init__()
        self.backbone = backbone
        self.head = nn.Sequential(
            nn.Linear(in_features=(25088), out_features=target_size*2),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(in_features=target_size*2, out_features=target_size),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(in_features=target_size, out_features=target_size),
        )

    def forward(self, x):
        x = self.backbone(x)
        x = x.view(x.size(0), -1)
        x = self.head(x)
        x = x / x.max()
        return x

In [None]:
devise_model = DeViSE(backbone).to(device)

In [None]:
devise_model.load_state_dict(torch.load('/mnt/efs/models/devise-google-2018-10-03.pt'))

# wellcome images dataset and dataloader
We can now define the process for ingesting wellcome images. This is a much simpler process than in previous notebooks as we no longer need to asign them a target word. In this section we are only _inferring_ the word-vector positions according to our DeViSE network.

In [None]:
df = {}

for subdir in os.listdir('/mnt/efs/images/wellcome_images/'):
    subdir_path = '/mnt/efs/images/wellcome_images/{}/'.format(subdir)
    for file_name in os.listdir(subdir_path):
        df[subdir_path + file_name] = subdir

df = pd.Series(df).to_frame().reset_index()
df.columns = ['path', 'word']

In [None]:
df = df.sample(frac=1).reset_index(drop=True) 

In [None]:
class ImageDataset(Dataset):
    def __init__(self, dataframe, transform=transforms.ToTensor()):
        self.image_paths = dataframe['path'].values
        self.transform = transform

    def __getitem__(self, index):
        image = Image.open(self.image_paths[index]).convert('RGB')
        if self.transform is not None:
            image = self.transform(image)
        return image

    def __len__(self):
        return len(self.image_paths)

In [None]:
transform = transforms.Compose([transforms.RandomResizedCrop(224, scale=[0.6, 0.9]),
                                transforms.ToTensor()])

In [None]:
dataset = ImageDataset(df, transform)

In [None]:
test_loader = DataLoader(dataset=dataset,
                         batch_size=128,
                         num_workers=5)

# make predictions for wellcome images

In [None]:
preds = []

devise_model.eval()
with torch.no_grad():
    test_loop = tqdm(test_loader)
    for images in test_loop:
        images = images.cuda(non_blocking=True)
        predictions = devise_model(images)
        preds.append(predictions.cpu().data.numpy())
        
        test_loop.set_description('Test set')

In [None]:
preds = np.concatenate(preds).reshape(-1, 300)

# run a search on the predictions
We're now ready to run a search in the new, shared space of words and images. Note that the shared space has the exact same geometry as the word-vector space but is now populated by word vector predictions of images. In other words, we've _projected_ our images onto the manifold in 300-dimensional space which is occupied by word-vectors, with the hope that we've preserved their visual-semantic meanings. 

In [None]:
preds.shape

In [None]:
def search(query, n=5):
    image_paths = df['path'].values
    distances = cdist(word_vectors[query].reshape(1, -1), preds)
    closest_n_paths = image_paths[np.argsort(distances)].squeeze()[:n]
    close_images = [np.array(Image.open(image_path).convert('RGB').resize((224,224)))
                    for image_path in closest_n_paths]
    return Image.fromarray(np.concatenate(close_images, axis=1))

In [None]:
search('sad')

That works reasonably well, but the differences in the style of the datasets is clear. We'll need to address this later.