# inference only demo
We're done! We have a working pair of models which produce meaninful shared embeddings for text and images, which we can use to run image searches without relying on detailed metadata. The only thing to do now is ensure that the search process is fast enough to be practical, and lay out all of the pieces we need to run this outside of a notebook environment.

In [None]:
import torch
import pickle
import nmslib
import urllib
import numpy as np 

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
nltk.download('punkt')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# load data
First we'll load a bunch of the lookup data we need to make this thing work. Nothing new here.

In [None]:
index_to_wordvec = np.load('/mnt/efs/models/index_to_wordvec.npy')
word_to_index = pickle.load(open('/mnt/efs/models/word_to_index.pkl', 'rb'))

path_to_id = lambda x: x.split('/')[-1].split('.')[0]
image_ids = np.array(list(map(path_to_id, 
                              np.load('/mnt/efs/models/image_ids.npy'))))

# load devise'd embeddings for all images
We pre-computed the learned visual-semantic embeddings for all of our images at the end of the last notebook, so we can just reload them here. Remember, they're sentence-space representations of the images, so all that needs to happen at query-time is the embedding of the query sentence into the same space, and a KNN lookup of the most similar images.

In [None]:
embeddings = np.load('/mnt/efs/models/embeddings.npy').reshape(-1, 4096)

# utils
Again, we'll create a couple of utility functions to shrink the sentence embedding process down to a single function call.

In [None]:
def sentence_to_indexes(sentence):
    tokenised = word_tokenize(sentence)
    indexes = [word_to_index[word] 
               for word in tokenised 
               if word in word_to_index]
    return indexes

def embed(sentence):
    indexes = ([word_to_index['<s>']] + 
               sentence_to_indexes(sentence) +
               [word_to_index['</s>']])
    wvs = np.stack([index_to_wordvec[i] for i in indexes])
    embedding = model(torch.Tensor([wvs]).cuda()).cpu().data.numpy()    
    return embedding.squeeze()

def embed_paragraph(paragraph):
    sentences = sent_tokenize(paragraph)
    if len(sentences) == 0:
        return None
    else:
        embeddings = [embed(sentence) for sentence in sentences]
        return np.array(embeddings).max(axis=0)

# sentence embedding model
Now that we're only inferring an embedding for each sentence, we can ignore the `NLINet()` part of the network from notebook 8. We no longer need to classify sentence pairs or backpropagate any weights, so the remaining network is incredibly small and can be run without much trouble on a CPU. We saved the weights for this half of the network at the end of the last notebook, which we can inject into the matching network architecture here.

In [None]:
hidden_size = 2048

class SentenceEncoder(nn.Module):
    def __init__(self):
        super(SentenceEncoder, self).__init__()
        self.enc_lstm = nn.LSTM(input_size=300, 
                                hidden_size=hidden_size, 
                                num_layers=1,
                                bidirectional=True)
        
    def forward(self, wv_batch):
        embedded, _ = self.enc_lstm(wv_batch)
        max_pooled = torch.max(embedded, 1)[0] 
        return max_pooled

In [None]:
model = SentenceEncoder().to(device)
model_path = '/mnt/efs/models/sentence-encoder-2018-10-08.pt'

model.load_state_dict(torch.load(model_path))

# create nmslib search index
In the previous notebooks we've run searches by brute-forcing our way across the dataset, measuring the distance from our query embedding to every other individual point in sentence-space. This is exact, but _super_ inefficient, especially in a high-volume, high-dimensional case like ours. Here, and in our demo app, we'll use an _approximate_-nearest neighbours algorithm which transforms our data in sentence-embedding space into a hierarchical graph/tree structure, allowing us to traverse the whole thing with very few calculations. The approximate-ness of this ANN algorithm is small, and in the end we lose very little information by transforming it into this structure.  
Similar libraries like [annoy](https://github.com/spotify/annoy) leverage roughly the same technique to find nearest neighbours in high dimensional space, but [nmslib has been shown to be the most efficient](https://www.benfrederickson.com/approximate-nearest-neighbours-for-recommender-systems/) and we have no reason not to use it here.  
Pre-computing the index takes a while, but it vastly reduces the search time when we run a query. The index can also be saved in binary form and reloaded elsewhere, so we don't have to re-run that expensive computation every time we restart our demo. The python bindings for nmslib are very straightforward - we can create our fully functional index in just three lines of code.

In [None]:
index = nmslib.init(method='hnsw', space='cosinesimil')
index.addDataPointBatch(embeddings)
index.createIndex({'post': 2}, print_progress=True)

# search
Let's run a search, returning the closest MIRO IDs and attaching them to a `/works` query URL

In [None]:
def search(query):
    neighbour_indexes, _ = index.knnQuery(embed(query), k=10)
    return image_ids[neighbour_indexes]

In [None]:
results = search('mri brain scan')
base_url = 'https://wellcomecollection.org/works?query='
url_query = urllib.parse.quote_plus(' '.join(results))
print(base_url + url_query)

That's it - super fast, super effective image search with no metadata necessary! 

We've turned this notebook into a demo app hosted on AWS, which you can play with [here](http://labs.wellcomecollection.org/devise/index.html).