# FAISS (Facebook AI Similarity Search):
-  Designed for efficient similarity search in dense vector spaces.
-  Extremely fast because it keeps the index in RAM.
-  Best suited when your dataset can fit into main memory (RAM).

| Feature         | Description                                                                  |
| --------------- | ---------------------------------------------------------------------------- |
| **Storage**     | RAM (primarily)                                                              |
| **Performance** | Very fast (sub-millisecond latency possible)                                 |
| **Scalability** | Limited by system memory                                                     |
| **Persistence** | You can **save/load indexes**, but during search, the data must be in memory |
| **Sharding**    | Possible, but needs custom infrastructure                                    |



## Indexing
#### Exact Matching
- FLAT — (e.g., IndexFlatL2 or IndexFlatIP)
    -   Use when: Dataset fits in memory and you want 100% accuracy.
    -   Searches by brute force (linear/slow)
    -   L2 -> Euclidean distance
    -   IP -> Inner Product (Cosine similarity)

#### Approximation Matching
- IVF — Inverted File Index - (e.g.,  IndexIVFFlat, IndexIVFPQ)
    -   Approximation (cluster-based)
    -   Fast searches only in top clusters
- HNSW — Hierarchical Navigable Small World Graph[Graph based indexing] (e.g., IndexHNSWFlat)
    -   Approximation (graph based traversal)
    -   Very Fast (graph-walk)

In [1]:
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore


import os
from dotenv import load_dotenv
load_dotenv()
os.environ['GOOGLE_API_KEY']=os.getenv("GOOGLE_API_KEY")


from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
test_embedding = embeddings_model.embed_query("This is a good test")
print(len(test_embedding))

768


In [2]:
index=faiss.IndexFlatL2(768) # exactly matcing based on L2 (Euclidean) distance
index

<faiss.swigfaiss.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x137592940> >

In [3]:
vector_store=FAISS(
    embedding_function=embeddings_model,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

In [4]:
documents = [
    "Artificial intelligence (AI) is a set of technologies that enable computers to perform a variety of advanced functions, including the ability to see, understand and translate spoken and written language, analyze data, make recommendations, and more. ",
    "Artificial intelligence is a field of science concerned with building computers and machines that can reason, learn, and act in such a way that would normally require human intelligence or that involves data whose scale exceeds what humans can analyze. ",
    "Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.",
    "The dog (Canis familiaris or Canis lupus familiaris) is a domesticated descendant of the gray wolf. Also called the domestic dog, it was selectively bred from an extinct population of wolves during the Late Pleistocene by hunter-gatherers. The dog was the first species to be domesticated by humans, over 14,000 years ago and before the development of agriculture. Due to their long association with humans, dogs have gained the ability to thrive on a starch-rich diet that would be inadequate for other canids.",
    "Dogs have been bred for desired behaviors, sensory capabilities, and physical attributes. Dog breeds vary widely in shape, size, and color. They have the same number of bones (with the exception of the tail), powerful jaws that house around 42 teeth, and well-developed senses of smell, hearing, and sight. Compared to humans, dogs possess a superior sense of smell and hearing, but inferior visual acuity. Dogs perform many roles for humans, such as hunting, herding, pulling loads, protection, companionship, therapy, aiding disabled people, and assisting police and the military.",
]

vector_store.add_texts(documents)

['a23c5102-faab-45d8-bd13-eea2c2252d46',
 '537e3d9b-9f30-44cd-8a4c-16bffc680ea5',
 'd97d01b1-202a-42b3-8260-6fa42cd0edb4',
 '4ca3df6e-df6f-40a2-9250-d5d0b8fd2b23',
 'bfe22dd3-e2dc-4e44-8f23-3cfc54a338e0']

In [5]:
vector_store.index_to_docstore_id

{0: 'a23c5102-faab-45d8-bd13-eea2c2252d46',
 1: '537e3d9b-9f30-44cd-8a4c-16bffc680ea5',
 2: 'd97d01b1-202a-42b3-8260-6fa42cd0edb4',
 3: '4ca3df6e-df6f-40a2-9250-d5d0b8fd2b23',
 4: 'bfe22dd3-e2dc-4e44-8f23-3cfc54a338e0'}

In [9]:
query = "Why do dogs tilt their heads when they hear certain sounds?"
results = vector_store.similarity_search(query, k=2)
results

[Document(id='bfe22dd3-e2dc-4e44-8f23-3cfc54a338e0', metadata={}, page_content='Dogs have been bred for desired behaviors, sensory capabilities, and physical attributes. Dog breeds vary widely in shape, size, and color. They have the same number of bones (with the exception of the tail), powerful jaws that house around 42 teeth, and well-developed senses of smell, hearing, and sight. Compared to humans, dogs possess a superior sense of smell and hearing, but inferior visual acuity. Dogs perform many roles for humans, such as hunting, herding, pulling loads, protection, companionship, therapy, aiding disabled people, and assisting police and the military.'),
 Document(id='4ca3df6e-df6f-40a2-9250-d5d0b8fd2b23', metadata={}, page_content='The dog (Canis familiaris or Canis lupus familiaris) is a domesticated descendant of the gray wolf. Also called the domestic dog, it was selectively bred from an extinct population of wolves during the Late Pleistocene by hunter-gatherers. The dog was th

In [13]:
query = "What is branch of computer science in which we deal with machine learning?"
results = vector_store.similarity_search(query, k=3)
results

[Document(id='d97d01b1-202a-42b3-8260-6fa42cd0edb4', metadata={}, page_content='Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.'),
 Document(id='537e3d9b-9f30-44cd-8a4c-16bffc680ea5', metadata={}, page_content='Artificial intelligence is a field of science concerned with building computers and machines that can reason, learn, and act in such a way that would normally require human intelligence or that involves data whose scale exceeds what humans can analyze. '),
 Document(id='a23c5102-faab-45d8-bd13-eea2c2252d46', metadata={}, page_content='Artificial intelligence (AI) is a se

In [19]:
# from uuid import uuid4
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "website"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]

In [20]:
index=faiss.IndexFlatIP(768)
vector_store=FAISS(
    embedding_function=embeddings_model,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

In [21]:
vector_store.add_documents(documents=documents)

['9e98c691-9620-4381-b242-24baf164ce42',
 '1452bce2-c61c-4a40-99e1-214a948b0ed0',
 '0acfacdd-0333-436a-b175-fe0383a2ceb6',
 '8e350ff1-4533-42fe-ab41-56277fa1aa2c',
 '4ab49481-bbce-4742-9afc-25ccd819cfe7',
 '19ad680d-f985-427e-9e0f-e28bf75c1aea',
 '62e61641-e260-4275-ad16-9f8282cb35f0',
 '00a23dad-d08b-4f68-88f2-2870a6e29782',
 'fe18031e-298e-4d8c-8992-336d6a3d607d',
 'cc72d686-71d8-4c26-ad64-293066fe22e7']

In [22]:
vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2
)

[Document(id='00a23dad-d08b-4f68-88f2-2870a6e29782', metadata={'source': 'website'}, page_content='LangGraph is the best framework for building stateful, agentic applications!'),
 Document(id='0acfacdd-0333-436a-b175-fe0383a2ceb6', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')]

In [24]:
vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    filter={"source":{"$eq": "tweet"}}    ,
    k=2
)

[Document(id='0acfacdd-0333-436a-b175-fe0383a2ceb6', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='cc72d686-71d8-4c26-ad64-293066fe22e7', metadata={'source': 'tweet'}, page_content='I have a bad feeling I am going to get deleted :(')]

In [29]:
result = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    filter={"source":{"$eq": "website"}}    ,
    k=2
)

print(result)

print(result[0].metadata)
print(result[0].page_content)

[Document(id='00a23dad-d08b-4f68-88f2-2870a6e29782', metadata={'source': 'website'}, page_content='LangGraph is the best framework for building stateful, agentic applications!'), Document(id='19ad680d-f985-427e-9e0f-e28bf75c1aea', metadata={'source': 'website'}, page_content='Is the new iPhone worth the price? Read this review to find out.')]
{'source': 'website'}
LangGraph is the best framework for building stateful, agentic applications!


## Retriever

In [32]:
retriever=vector_store.as_retriever(
    search_kwargs={
        "k": 3
    }
)

retriever.invoke("LangChain provides abstractions to make working with LLMs easy")

[Document(id='00a23dad-d08b-4f68-88f2-2870a6e29782', metadata={'source': 'website'}, page_content='LangGraph is the best framework for building stateful, agentic applications!'),
 Document(id='0acfacdd-0333-436a-b175-fe0383a2ceb6', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='cc72d686-71d8-4c26-ad64-293066fe22e7', metadata={'source': 'tweet'}, page_content='I have a bad feeling I am going to get deleted :(')]

In [34]:
vector_store.save_local("vector-store")

In [35]:
new_vector_store=FAISS.load_local(
  "vector-store",embeddings_model ,allow_dangerous_deserialization=True
)

In [36]:
new_vector_store.similarity_search("langchain")

[Document(id='0acfacdd-0333-436a-b175-fe0383a2ceb6', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='00a23dad-d08b-4f68-88f2-2870a6e29782', metadata={'source': 'website'}, page_content='LangGraph is the best framework for building stateful, agentic applications!'),
 Document(id='cc72d686-71d8-4c26-ad64-293066fe22e7', metadata={'source': 'tweet'}, page_content='I have a bad feeling I am going to get deleted :('),
 Document(id='4ab49481-bbce-4742-9afc-25ccd819cfe7', metadata={'source': 'tweet'}, page_content="Wow! That was an amazing movie. I can't wait to see it again.")]