In [1]:
%run supportvectors-common.ipynb



<div style="color:#aaa;font-size:8pt">
<hr/>

 </blockquote>
 <hr/>
</div>



# Approximate k-NN using FAISS

In this lab, we will experiment with one of the approximate nearest neighbors search algorithm, `FAISS`. It is highly scalable to large datasets, as well as being very fast. Let us start by installing it first.

In [2]:
#!pip install faiss-cpu
!pip install faiss-gpu



In [3]:
import faiss                   # make faiss available
d = 768
index = faiss.IndexFlatL2(d)   # build the index

In [4]:
%run Lesson_34___corpus.ipynb

### Load the sentence transformer

We need a sentence transformer to convert documents or sentences into vector embeddings.

In [5]:
from sentence_transformers import SentenceTransformer

MODEL = 'msmarco-distilbert-base-v4'
embedder = SentenceTransformer(MODEL)

#### Search index of sentence embeddings

Let us now create the search index of sentence embeddings.

In [6]:
embeddings = embedder.encode(sentences)
embeddings.shape

(16, 768)

In [7]:
embeddings.dtype

dtype('float32')

Clearly, there are 16 embeddings, each of a 768 dimensional vector. Let us glance at a sentence, and its embedding:

In [8]:
print (f'{sentences[0]}  {embeddings[0]}')


’Twas brillig, and the slithy toves
      Did gyre and gimble in the wabe:
All mimsy were the borogoves,
      And the mome raths outgrabe.

“Beware the Jabberwock, my son!
      The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
      The frumious Bandersnatch!”

He took his vorpal sword in hand;
      Long time the manxome foe he sought—
So rested he by the Tumtum tree
      And stood awhile in thought.

And, as in uffish thought he stood,
      The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
      And burbled as it came!

One, two! One, two! And through and through
      The vorpal blade went snicker-snack!
He left it dead, and with its head
      He went galumphing back.

“And hast thou slain the Jabberwock?
      Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!”
      He chortled in his joy.

’Twas brillig, and the slithy toves
      Did gyre and gimble in the wabe:
All mimsy were the borogoves,
      And the mo

In [9]:
index.add(embeddings)

## Query for approximate k-nearest neighbors
Finally, let us now query for a few nearest neighbor searches. We will take the query phrase `a friendship with animals`, just as in the previous labs. We must remember to first create the sentence vector embedding of the query text.

In [10]:
query_text = 'a love of dogs'
query = embedder.encode(query_text).reshape(1, 768)
k = 3
D, I = index.search(query, k)

I

array([[ 4,  7, 10]])

### ANN-Search results


Let us now print the documents at those indices.

In [11]:
from IPython.display import Markdown, HTML
for i, idx in enumerate(I[0]):
    result = sentences[idx]
    display_result = f'<p><b>Search result # {i}</b></p><p>{result}</p>'
    print('-'*80)
    display(HTML(display_result))


--------------------------------------------------------------------------------


--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
