## Building a RAG System with Langchain and FAISS
Introduction to RAG combines the power of retrieval systems with generative AI models. Instead of relying solely on the model's training data RAG:
- Retrieval relevant documents from a knowledge base
- Uses these documents as context for the LLM
- Generates responses based on both the retrieved context and the model's knowledge
`...................................................................................................`

## FAISS
Faiss is a library for efficient similarity search and clustering of dense vectors.
`Key advantages:`
- Extremely fast similarity search
- Memory efficient
- Support GPU acceleration
- Can handle millions of vectors

# How works
- Indexes vectors for fast nearest neighbor search
- Returns  most similar vectors based in distance metrics

In [1]:
# load libraries
import os
from dotenv import load_dotenv
import numpy as np
import warnings
warnings.filterwarnings("ignore")

from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.runnables import (
    RunnablePassthrough
)

from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage

# LangChain specific imports
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Load environment variables
load_dotenv()

True

# Data ingestion and processing

In [2]:
sample_documents = [
    Document(
        page_content="""
        Artificial Intelligence (AI) is the simulation of human intelligence in machines.
        These systems are designed to think like humans and mimic their actions.
        AI can be categorized into narrow AI and general AI.
        """,
        metadata={"source": "AI Introduction", "page": 1, "topic": "AI"}
    ),
    Document(
        page_content="""
        Machine Learning is a subset of AI that enables systems to learn from data.
        Instead of being explicitly programmed, ML algorithms find patterns in data.
        Common types include supervised, unsupervised, and reinforcement learning.
        """,
        metadata={"source": "ML Basics", "page": 1, "topic": "ML"}
    ),
    Document(
        page_content="""
        Deep Learning is a subset of machine learning based on artificial neural networks.
        It uses multiple layers to progressively extract higher-level features from raw input.
        Deep learning has revolutionized computer vision, NLP, and speech recognition.
        """,
        metadata={"source": "Deep Learning", "page": 1, "topic": "DL"}
    ),
    Document(
        page_content="""
        Natural Language Processing (NLP) is a branch of AI that helps computers understand human language.
        It combines computational linguistics with machine learning and deep learning models.
        Applications include chatbots, translation, sentiment analysis, and text summarization.
        """,
        metadata={"source": "NLP Overview", "page": 1, "topic": "NLP"}
    )
]

print(sample_documents)

[Document(metadata={'source': 'AI Introduction', 'page': 1, 'topic': 'AI'}, page_content='\n        Artificial Intelligence (AI) is the simulation of human intelligence in machines.\n        These systems are designed to think like humans and mimic their actions.\n        AI can be categorized into narrow AI and general AI.\n        '), Document(metadata={'source': 'ML Basics', 'page': 1, 'topic': 'ML'}, page_content='\n        Machine Learning is a subset of AI that enables systems to learn from data.\n        Instead of being explicitly programmed, ML algorithms find patterns in data.\n        Common types include supervised, unsupervised, and reinforcement learning.\n        '), Document(metadata={'source': 'Deep Learning', 'page': 1, 'topic': 'DL'}, page_content='\n        Deep Learning is a subset of machine learning based on artificial neural networks.\n        It uses multiple layers to progressively extract higher-level features from raw input.\n        Deep learning has revolu

In [3]:
## text splitting
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len,
    separators = [" "]
)

## split the documents into chunks

chunks = text_splitter.split_documents(sample_documents)
print(chunks[0])
print(chunks[1])


page_content='Artificial Intelligence (AI) is the simulation of human intelligence in machines.
        These systems are designed to think like humans and mimic their actions.
        AI can be categorized into narrow AI and general AI.' metadata={'source': 'AI Introduction', 'page': 1, 'topic': 'AI'}
page_content='Machine Learning is a subset of AI that enables systems to learn from data.
        Instead of being explicitly programmed, ML algorithms find patterns in data.
        Common types include supervised, unsupervised, and reinforcement learning.' metadata={'source': 'ML Basics', 'page': 1, 'topic': 'ML'}


In [4]:
print(chunks[0].page_content)
print(chunks[1].metadata)

Artificial Intelligence (AI) is the simulation of human intelligence in machines.
        These systems are designed to think like humans and mimic their actions.
        AI can be categorized into narrow AI and general AI.
{'source': 'ML Basics', 'page': 1, 'topic': 'ML'}


In [5]:
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


In [6]:
## initalize OpenAI Embedding with the last model
embedding= OpenAIEmbeddings(
    model = "text-embedding-3-small",
    dimensions=1536
)
sample_text = "what is data"
sample_emb = embedding.embed_query(sample_text)
sample_emb

[0.017455190420150757,
 0.0031962047796696424,
 0.02346927486360073,
 -0.016685493290424347,
 0.055052995681762695,
 -0.047721292823553085,
 -0.019333776086568832,
 0.008427543565630913,
 -0.07336920499801636,
 -0.025021716952323914,
 0.021303681656718254,
 -0.02093840204179287,
 -0.04292046278715134,
 -0.06528084725141525,
 0.04913023114204407,
 0.021864648908376694,
 0.013658883050084114,
 -0.005459639243781567,
 -0.0034995179157704115,
 0.03668460249900818,
 0.0020563337020576,
 0.02378237247467041,
 0.025139128789305687,
 0.013828476890921593,
 0.026039283722639084,
 0.008668890222907066,
 -0.0008108738693408668,
 0.021721145138144493,
 0.031022753566503525,
 -0.04490341246128082,
 -0.02708294242620468,
 -0.03931983932852745,
 0.03321443870663643,
 -0.031205393373966217,
 0.08521472662687302,
 0.011349787935614586,
 -0.007586094085127115,
 0.022634347900748253,
 0.0021606995724141598,
 -0.02114713378250599,
 -0.055157359689474106,
 -0.030292192474007607,
 -0.006966421380639076,
 0.

In [7]:
texts=["AI","MAchine learning","Deep Learning","Neural Network"]
batch_emb = embedding.embed_documents(texts)
print(batch_emb[0])

[-0.008172601461410522, -0.024610374122858047, 0.002937441924586892, 0.025139344856142998, 0.0065063429065048695, -0.02827349863946438, -0.0050384486094117165, 0.020947251468896866, -0.03689572587609291, 0.012867218814790249, -0.003003563266247511, -0.020167019218206406, 0.00029114066273905337, -0.03274330496788025, 0.006460058037191629, -0.025311261415481567, -0.03102414682507515, -0.054378215223550797, 0.03276975080370903, -0.018355293199419975, 0.016583239659667015, 0.048321500420570374, -0.024874860420823097, 0.014388010837137699, 0.02935788966715336, 0.004000342916697264, 0.009290052577853203, 0.013376353308558464, 0.002496081870049238, -0.022547388449311256, 0.03210853785276413, -0.028035461902618408, 0.005335994530469179, -0.03816525638103485, -0.016741931438446045, 0.01437478605657816, -0.03861488029360771, -0.010407503694295883, -0.010546359233558178, -0.019148750230669975, 0.03208208829164505, 0.014652496203780174, -0.021529119461774826, 0.016067493706941605, -0.0118225011974

In [8]:
## Compare Embedding using cosine similarity
def compare_emb(text1:str, text2:str):
    emb1 = np.array(embedding.embed_query(text1))
    emb2 = np.array(embedding.embed_query(text2))

    # Calculate the similarity score
    similarity = np.dot(emb1, emb2)/(np.linalg.norm(emb1) * np.linalg.norm(emb2))
    return similarity

In [9]:
# test semantic similarity
print("{:.3f}".format(compare_emb("AI", "artificial intelligence")))


0.540


In [10]:
print(f"{compare_emb("shashlik", "pizza"):.3f}")

0.322


## Create FAISS vector Store

In [11]:
vectorstore = FAISS.from_documents(
    documents=chunks,
    embedding=embedding
)
print(f"Vector store create with `{vectorstore.index.ntotal}` vectors")

Vector store create with `4` vectors


In [12]:
## Save vector store
vectorstore.save_local("faiss_index")


In [13]:
## load vector store
loaded_vector = FAISS.load_local(
    "faiss_index",
    embedding,
    allow_dangerous_deserialization=True
)

loaded_vector

<langchain_community.vectorstores.faiss.FAISS at 0x7fceb23c1310>

In [14]:
## similarity Search
query = "what is deep learning"
results = vectorstore.similarity_search(query, k=3)
results

[Document(id='4077d5ff-a5ad-4255-b97c-6f1ac76b5e41', metadata={'source': 'Deep Learning', 'page': 1, 'topic': 'DL'}, page_content='Deep Learning is a subset of machine learning based on artificial neural networks.\n        It uses multiple layers to progressively extract higher-level features from raw input.\n        Deep learning has revolutionized computer vision, NLP, and speech recognition.'),
 Document(id='c93c0f82-2873-4800-8b21-2bee1d72c2c8', metadata={'source': 'ML Basics', 'page': 1, 'topic': 'ML'}, page_content='Machine Learning is a subset of AI that enables systems to learn from data.\n        Instead of being explicitly programmed, ML algorithms find patterns in data.\n        Common types include supervised, unsupervised, and reinforcement learning.'),
 Document(id='56da2352-36da-47d3-b144-4d6c93404691', metadata={'source': 'NLP Overview', 'page': 1, 'topic': 'NLP'}, page_content='Natural Language Processing (NLP) is a branch of AI that helps computers understand human la

In [15]:
print(f"Query: {query}\n")
print("Top 3 similar chunks:")
for i, doc in enumerate(results):
    print(f"\n{i+1}. Source: {doc.metadata['source']}")
    print(f"   Content: {doc.page_content[:200]}...")

Query: what is deep learning

Top 3 similar chunks:

1. Source: Deep Learning
   Content: Deep Learning is a subset of machine learning based on artificial neural networks.
        It uses multiple layers to progressively extract higher-level features from raw input.
        Deep learning ...

2. Source: ML Basics
   Content: Machine Learning is a subset of AI that enables systems to learn from data.
        Instead of being explicitly programmed, ML algorithms find patterns in data.
        Common types include supervised...

3. Source: NLP Overview
   Content: Natural Language Processing (NLP) is a branch of AI that helps computers understand human language.
        It combines computational linguistics with machine learning and deep learning models.
      ...


In [16]:
## similarity Search with score
results2 = vectorstore.similarity_search_with_score(query, k=3)
print("\n\n Similarity search with scores:")
for doc, score in results2:
    print(f"\nScore: {score:.3f}")
    print(f"Sourse : {doc.metadata['source']}")
    print(f"Content preview: {doc.page_content[:100]}")



 Similarity search with scores:

Score: 0.538
Sourse : Deep Learning
Content preview: Deep Learning is a subset of machine learning based on artificial neural networks.
        It uses m

Score: 1.168
Sourse : ML Basics
Content preview: Machine Learning is a subset of AI that enables systems to learn from data.
        Instead of being

Score: 1.287
Sourse : NLP Overview
Content preview: Natural Language Processing (NLP) is a branch of AI that helps computers understand human language.



In [17]:
## search with metadata filtering
filter_dict = {"topic":"DL"}
filtered_result = vectorstore.similarity_search(
    query,
    k =3,
    filter=filter_dict
)
filtered_result

[Document(id='4077d5ff-a5ad-4255-b97c-6f1ac76b5e41', metadata={'source': 'Deep Learning', 'page': 1, 'topic': 'DL'}, page_content='Deep Learning is a subset of machine learning based on artificial neural networks.\n        It uses multiple layers to progressively extract higher-level features from raw input.\n        Deep learning has revolutionized computer vision, NLP, and speech recognition.')]

## Build RAG Chain with LCEL

In [18]:
from langchain.chat_models import init_chat_model

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

llm = init_chat_model(model = "groq:gemma2-9b-it")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7fceb0828980>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7fceb082a120>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [19]:
# Simple RAG Chain with LCEL
simple_prompt = ChatPromptTemplate.from_template("""Answer the question based only on the following context:
Context: {context}
Question: {question}
Answer:
""")

In [20]:
## basic retriever
retriever = vectorstore.as_retriever(
    search_type = "similarity",
    search_kwargs = {"k":3}
)
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7fceb23fd550>, search_kwargs={'k': 3})

In [21]:
from typing import List
def format_docs(docs: List[Document])-> str:
    formatted = []
    for i, doc in enumerate(docs):
        source = doc.metadata.get("source", "Unknown")
        formatted.append(f"Document {i+1} (Source: {source}):\n {doc.page_content}")
    return formatted

In [22]:
simple_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    |simple_prompt
    |llm
    |StrOutputParser()
)
simple_rag_chain

{
  context: VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7fceb23fd550>, search_kwargs={'k': 3})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\nContext: {context}\nQuestion: {question}\nAnswer:\n'), additional_kwargs={})])
| ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7fceb0828980>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7fceb082a120>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))
| StrOutputParser()

In [23]:
## Conversational RAG Chain
conversational_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the provided context to answer questions."),
    ("placeholder", "{chat_history}"),
    ("human", "Context: {context}\n\nQuestion: {input}"),
])

In [24]:
def create_conversational_rag():
    """ Create a conversational RAG chain with memory"""
    return(
        RunnablePassthrough.assign(
            context = lambda x: format_docs(retriever.invoke(x["input"]))
        )
        |conversational_prompt
        |llm
        |StrOutputParser
    )
conversational_rag = create_conversational_rag()
conversational_rag

RunnableAssign(mapper={
  context: RunnableLambda(lambda x: format_docs(retriever.invoke(x['input'])))
})
| ChatPromptTemplate(input_variables=['context', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag=

In [25]:
## streaming RAG chain
streaming_rag_chain = (
    {"context": retriever|format_docs, "question": RunnablePassthrough()}
    |simple_prompt
    |llm
)


## Modern RAG chains created successfully!
# Available chain:
- simple_rag_chain: Basic Q&A
- conversational_rag: Maintains conversation history
- Streaming_rag_chain: Supports token streaming


In [26]:
## Test function for different chain types
def test_rag_chains(question:str):
    """Test all RAG chain variants"""
    print(f"Question: {question}")
    print("=" * 50)

    # simple rag
    print(f"\n1. Simple RAG chain:")
    answer = simple_rag_chain.invoke(question)
    print(f"Answer: {answer}")

    print("\n2. Streaming RAG")
    print(f"Answer:", end = "", flush=True)
    for chunk in streaming_rag_chain.stream(question):
        print(chunk.content, end = '', flush=True)
    print()

In [27]:
test_rag_chains("what is the difference between AI and machine learning?")

Question: what is the difference between AI and machine learning?

1. Simple RAG chain:
Answer: Artificial Intelligence (AI) is the broad concept of machines simulating human intelligence, while Machine Learning (ML) is a *subset* of AI that focuses on enabling systems to learn from data instead of explicit programming.  


2. Streaming RAG
Artificial Intelligence (AI) is the broad concept of machines simulating human intelligence, while Machine Learning (ML) is a specific subset of AI that focuses on enabling systems to learn from data rather than being explicitly programmed.  



In [28]:
## Test with multiple questions
test_question = [
    "What is the difference between AI and Machine Learning?",
    "Explain deep learning in simple terms",
    "How does NLP work?"
]

for question in test_question:
    print("\n"+ "="* 80 + "\n")
    test_rag_chains(question)



Question: What is the difference between AI and Machine Learning?

1. Simple RAG chain:
Answer: Artificial Intelligence (AI) is the broad concept of machines simulating human intelligence, while Machine Learning (ML) is a specific subset of AI that allows systems to learn from data without explicit programming. 


2. Streaming RAG
Artificial Intelligence (AI) is the broad concept of machines simulating human intelligence. Machine Learning (ML) is a *subset* of AI that focuses on enabling systems to learn from data without explicit programming.  



Question: Explain deep learning in simple terms

1. Simple RAG chain:
Answer: Deep learning is a type of machine learning that uses many layers of artificial neurons to learn complex patterns from data.  Think of it like teaching a computer to see by showing it lots of pictures and letting it figure out what makes a cat a cat, a dog a dog, etc. 


2. Streaming RAG
Deep learning is a type of machine learning that uses many layers of artific

In [None]:
## Conversational example
print("\n3. Conversational RAG Example:")
chat_history = []

# First question
q1 = "What is machine learning?"
a1 = conversational_rag.invoke({
    "input": q1,
    "chat_history": chat_history
})

print(f"Q1: {q1}")
print(f"A1: {a1}")

In [None]:
# Update history
chat_history.extend([
    HumanMessage(content=q1),
    AIMessage(content=a1)
])

In [None]:
q2 = "How is it different from traditional programming?"
a2 = conversational_rag.invoke({
    "input": q2,
    "chat_history": chat_history
})
print(f"\nQ2: {q2}")
print(f"A2: {a2}")