<a href="https://colab.research.google.com/github/tpadmapriyaGitHub/AgenticAI/blob/Training/Naive_RAG_Agent_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Building a RAG Agent with Langchain, Google GenAI, and FAISS

This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) agent using Langchain, Google GenAI, and FAISS. The objective is to create an agent that can answer questions based on information retrieved from an external knowledge source, in this case, a blog post.

The notebook covers the following steps:
1. **Installation of necessary libraries**: Installing Langchain, langchain-text-splitters, langchain-community, bs4, and faiss-cpu.
2. **Setting up Google Generative AI**: Importing and initializing the `ChatGoogleGenerativeAI` model and `GoogleGenerativeAIEmbeddings` for generating text and embeddings.
3. **Creating a Vector Store with FAISS**: Setting up FAISS for efficient similarity search of document embeddings.
4. **Loading and processing data**: Using `WebBaseLoader` to load content from a blog post and `RecursiveCharacterTextSplitter` to split the document into smaller chunks.
5. **Adding documents to the vector store**: Embedding the document chunks and adding them to the FAISS vector store.
6. **Defining a retrieval tool**: Creating a Langchain tool to retrieve relevant document chunks based on a user query.
7. **Creating an agent**: Building an agent using the initialized model and the retrieval tool.
8. **Running the agent**: Executing the agent with a sample query to demonstrate its ability to retrieve information and formulate a response.

The outcome of this notebook is a functional RAG agent that can use external knowledge to answer questions.

### Installation of necessary libraries

---



This cell installs the core Langchain libraries, including `langchain-text-splitters` for breaking down text, `langchain-community` for various components, and `bs4` for parsing HTML. These libraries are essential for building the RAG agent.

In [1]:
!pip install langchain langchain-text-splitters langchain-community bs4

Collecting langchain-text-splitters
  Downloading langchain_text_splitters-1.0.0-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain-community)
  Downloading langchain_classic-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain-community)
  Downloading typing_inspec

This cell installs `langchain-google-genai`, which provides the necessary integrations to use Google's Generative AI models with Langchain.

In [None]:
# !pip install -qU langchain-google-genai

In [2]:
!pip install -U "langchain[google-genai]"

Collecting langchain-google-genai (from langchain[google-genai])
  Downloading langchain_google_genai-3.2.0-py3-none-any.whl.metadata (2.7 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai->langchain[google-genai])
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<1.0.0,>=0.9.0 (from langchain-google-genai->langchain[google-genai])
  Downloading google_ai_generativelanguage-0.9.0-py3-none-any.whl.metadata (10 kB)
Downloading langchain_google_genai-3.2.0-py3-none-any.whl (57 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Downloading google_ai_generativelanguage-0.9.0-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m55.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: filetype, google-ai-generativelangu

This cell installs `faiss-cpu`, a library for efficient similarity search of embeddings. FAISS will be used to build the vector store.

In [3]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.6/23.6 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.0


### Setting up Google Generative AI

This cell imports the `ChatGoogleGenerativeAI` class from `langchain_google_genai` and sets up the Google API key from the environment variables. It then initializes the `ChatGoogleGenerativeAI` model, which will be used for generating responses.

In [5]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from google.colab import userdata

#os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] ="AIzaSyCn8LFZgF_NfnzWGZXQ-NmpM3BBJvuHwK4"

model = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")

This cell imports `GoogleGenerativeAIEmbeddings` and initializes it. Embeddings are numerical representations of text, and these embeddings will be used to convert the document chunks into a format that can be searched for similarity.

In [6]:
import getpass
import os


from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

### Creating a Vector Store with FAISS

This cell sets up the FAISS vector store. It creates a FAISS index for similarity search and initializes the `FAISS` vector store with the embedding function, index, and an in-memory document store.

In [7]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

embedding_dim = len(embeddings.embed_query("hello world"))
index = faiss.IndexFlatL2(embedding_dim)

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

### Loading and processing data

This cell loads the content from a blog post using `WebBaseLoader`. It specifically filters for the post title, headers, and content using `bs4.SoupStrainer` to focus on relevant information.

In [8]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")



Total characters: 43047


This cell splits the loaded document into smaller chunks using `RecursiveCharacterTextSplitter`. This is done to manage the size of the text being processed and embedded.

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 63 sub-documents.


### Adding documents to the vector store

This cell adds the document chunks to the FAISS vector store. It processes the chunks in batches, embeds them using the `embeddings` object, and adds them to the vector store for efficient retrieval later.

In [10]:
# Define batch size for embedding
batch_size = 16

# Initialize an empty list to store document IDs
document_ids = []

# Process documents in batches
for i in range(0, len(all_splits), batch_size):
    batch = all_splits[i:i + batch_size]
    batch_ids = vector_store.add_documents(documents=batch)
    document_ids.extend(batch_ids)

print(document_ids[:3])

['bcfaa482-bf93-4fb5-8c3e-dcba0237705c', '953ce687-0bec-459c-b1fc-fe69b0029362', '8417a401-e2b5-4eb5-a8aa-281b6026ace3']


### Defining a retrieval tool

This cell defines a Langchain tool called `retrieve_context`. This tool uses the `vector_store` to perform a similarity search based on a user query and retrieves the most relevant document chunks.

In [11]:
from langchain.tools import tool

@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

### Creating an agent

This cell creates the RAG agent. It uses the initialized `model` (ChatGoogleGenerativeAI) and the `retrieve_context` tool. A system prompt is also defined to guide the agent's behavior.

In [12]:
from langchain.agents import create_agent


tools = [retrieve_context]
# If desired, specify custom instructions
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)
agent = create_agent(model, tools, system_prompt=prompt)

### Running the agent

This cell runs the RAG agent with a sample query. It demonstrates how the agent uses the `retrieve_context` tool to find relevant information from the blog post and then formulates a response based on the retrieved context.

In [13]:
query = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.
Tool Calls:
  retrieve_context (720741c6-f938-4e19-9158-2bb0dda49424)
 Call ID: 720741c6-f938-4e19-9158-2bb0dda49424
  Args:
    query: What is the standard method for Task Decomposition?
Name: retrieve_context

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2578}
Content: Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the