# Semantic Search using Language Models and Nearest Neighbor Indexes

Go ahead and install these libraries


In [None]:
 %%capture
 !pip install transformers ir-measures torch 

Cross encoders are an easy-to-understand application of Language Models as they treate relevance as a binary classification task. A query and a document are concatenated and separated by [SEP] token, and a model learns to predict if a query document pair is relevant or not. This method is incredibly effective despite its simplicity. While effective, this approach is not efficient from an inference perspective. If we want to run this on 10,000 documents and 10,000 queries, we will have to use our model 100m times, while the same approach requires 20,000 model calls for a bi-encoder. This inefficiency is why cross-encoders are mostly used for reranking. 

Before we explore cross encoders for reranking and relevance estimation. We will be using the Sentence Transformers library, which is a Language Model library which is optimized for representing sentences as text.

## ReRanking With Transformers

Import the needed libraries

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

Load the model and the tokenizer, which prepares text for usage with the model

In [None]:
model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-electra-base')
tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-electra-base')

Downloading:   0%|          | 0.00/730 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/316 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Lets go ahead and look at some examples. Note that for each comparison I am repeating the query. Each usage requires another complete run of the model

In [None]:
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], 
                     ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  
                     padding=True, truncation=True, return_tensors="pt")
print(features)

{'input_ids': tensor([[  101,  2129,  2116,  2111,  2444,  1999,  4068,  1029,   102,  4068,
          2038,  1037,  2313,  1997,  1017,  1010, 19611,  1010,  6021,  2487,
          5068,  4864,  1999,  2019,  2181,  1997,  6486,  2487,  1012,  6445,
          2675,  7338,  1012,   102],
        [  101,  2129,  2116,  2111,  2444,  1999,  4068,  1029,   102,  2047,
          2259,  2103,  2003,  3297,  2005,  1996,  4956,  2688,  1997,  2396,
          1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 

In [None]:
model.eval() # turns off dropout
with torch.no_grad(): #save memory and speed by not calculating backward pass or storing gradient
    scores = model(**features).logits
    print(scores)

tensor([[  4.9342],
        [-10.6064]])


Using this same approach you will implement a reranking function that given a set of candidate documents and a query, will rerank them with a cross encoder. 