In [112]:
%pip install -r requirements.txt

Collecting langchain_core (from -r requirements.txt (line 5))
  Obtaining dependency information for langchain_core from https://files.pythonhosted.org/packages/6a/10/285fa149ce95300d91ea0bb124eec28889e5ebbcb59434d1fe2f31098d72/langchain_core-0.1.53-py3-none-any.whl.metadata
  Using cached langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
INFO: pip is looking at multiple versions of langchain to determine which version is compatible with other requirements. This could take a while.
Collecting langchain (from -r requirements.txt (line 1))
  Obtaining dependency information for langchain from https://files.pythonhosted.org/packages/e9/65/e5cc2876078fa5f1a621c8429f0174855c7e9831060d350626dbf8d2a10c/langchain-0.3.17-py3-none-any.whl.metadata
  Using cached langchain-0.3.17-py3-none-any.whl.metadata (7.1 kB)
  Obtaining dependency information for langchain from https://files.pythonhosted.org/packages/9e/42/1e98ac16fe273be60d1bc199f61ece9b751158ff65e65329221397b5fc8a/langchain-0.3.16-

<h1>Step 1: Data Ingestion (loaders)</h1>

In [113]:
# data ingestion technique 1
from langchain.document_loaders import TextLoader
loader = TextLoader("speech.txt")

In [114]:
docs = loader.load()
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Speech on Democracy\n\nGood morning everyone,\n\nIt is an honor to stand before you today to talk about one of the most significant pillars of human civilization – Democracy. Democracy is not just a system of governance; it is an ideal that empowers people, upholds their rights, and ensures that every individual has a voice in shaping their future.\n\nThe word democracy comes from the Greek words demos, meaning "people," and kratos, meaning "power"—together, "power of the people." It is a form of government where the citizens elect their leaders, participate in decision-making, and enjoy fundamental rights and freedoms. The essence of democracy lies in its core principles: equality, liberty, justice, and fraternity.\n\nDemocracy is not merely about casting votes during elections. It is about participation—active engagement in governance, policymaking, and societal development. It thrives on transparency, where governments are a

In [115]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ['GEMINI_API_KEY'] = os.getenv("GEMINI_API_KEY")



In [116]:
# WEB BASED LOADERS
from langchain_community.document_loaders import WebBaseLoader
import bs4
# load, chunk and index the content of the html file.

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()
docs



[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [117]:
# PDF READER
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('_Resume_1.pdf.pdf')
docs = loader.load()
docs

[Document(metadata={'producer': 'Canva', 'creator': 'Canva', 'creationdate': '2025-02-01T04:11:07+00:00', 'title': '_Resume_1.pdf', 'moddate': '2025-02-01T04:11:05+00:00', 'keywords': 'DAGd03GVGGo,BAFTHfLtHBY,0', 'author': 'Urvashi Urvashi', 'source': '_Resume_1.pdf.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content="URVASHI AGRAWAL\nFull Stack Developer\n2022 - 2024\xa0 Gorakhpur\n+919120112701\nhttps://urvashi-prof.vercel.app/\nurvashi16may@gmail.com India\nSKILLS\nTechnologies\nEXPERIENCE\nJunior Frontend Developer\nHyathi Technologies\nPROJECTS\nSpace Lens (PWA app)\nSpaceLens is a progressive web app. Space Lens\nhelps you explore the new wonders from space\neveryday. I made this app for to satisfy my own\ncuriosity to learn and see new amazing images\nfrom the space everyday, but it turned out my\nfriends and people doing astrophysics loved it\ntoo!\nCreated custom components, forms and interfaces for user\ninteractions in react.\nUsed Redux state management and 

<h2>Step 2: Splitting Huge Data into Chunks</h2>

In [118]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size  = 1000, chunk_overlap=200)
chunks_text = text_splitter.split_documents(docs)
chunks_text



[Document(metadata={'producer': 'Canva', 'creator': 'Canva', 'creationdate': '2025-02-01T04:11:07+00:00', 'title': '_Resume_1.pdf', 'moddate': '2025-02-01T04:11:05+00:00', 'keywords': 'DAGd03GVGGo,BAFTHfLtHBY,0', 'author': 'Urvashi Urvashi', 'source': '_Resume_1.pdf.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='URVASHI AGRAWAL\nFull Stack Developer\n2022 - 2024\xa0 Gorakhpur\n+919120112701\nhttps://urvashi-prof.vercel.app/\nurvashi16may@gmail.com India\nSKILLS\nTechnologies\nEXPERIENCE\nJunior Frontend Developer\nHyathi Technologies\nPROJECTS\nSpace Lens (PWA app)\nSpaceLens is a progressive web app. Space Lens\nhelps you explore the new wonders from space\neveryday. I made this app for to satisfy my own\ncuriosity to learn and see new amazing images\nfrom the space everyday, but it turned out my\nfriends and people doing astrophysics loved it\ntoo!\nCreated custom components, forms and interfaces for user\ninteractions in react.\nUsed Redux state management and 

<h1>Step 3: Embeddings - Convert the chunks of data in vectors</h1>

In [119]:
%pip install langchain-google-genai

[0mNote: you may need to restart the kernel to use updated packages.


In [120]:
pip install -qU langchain-community faiss-cpu

[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-google-genai 1.0.3 requires langchain-core<0.2,>=0.1.45, but you have langchain-core 0.3.34 which is incompatible.
langchain-openai 0.1.7 requires langchain-core<0.3,>=0.1.46, but you have langchain-core 0.3.34 which is incompatible.
langserve 0.2.3 requires langchain-core<0.3,>=0.1, but you have langchain-core 0.3.34 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [121]:
%pip install -qU langchain-google-vertexai

[0mNote: you may need to restart the kernel to use updated packages.


In [122]:
!gcloud auth login

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=u7hdoH1a1sXg5LT9peEZhmzwe99RRG&access_type=offline&code_challenge=OXcW3Q0QgxD0PGsLbYln0DQ6b3VDNoDaxMYWy6MyfC8&code_challenge_method=S256


You are now logged in as [urvashi16may@gmail.com].
Your current project is [None].  You can change this setting by running:
  $ gcloud config set project PROJECT_ID


In [123]:
# Vector embeddings and Vector Store
# from langchain_google_vertexai import VertexAIEmbeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
# embeddings = VertexAIEmbeddings(model="text-embedding-004")# 

In [133]:
db = FAISS.from_documents(chunks_text,GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
     google_api_key="AIzaSyBQ9QBAQvK78AVTeOE-msWVDJs3YuObT40"
))

db

<langchain_community.vectorstores.faiss.FAISS at 0x30a3164d0>

In [135]:
query = "Urvashi's education"
retrieved_result = db.similarity_search(query)
print(retrieved_result[0].page_content)

E
D
F
C
A
B
A
B
C
D
E
F
EDUCATION
BTech in Computer Science with AI as specialisation
Bennett University
ACTIVITIES AND HOBBIES ACHIEVEMENTS
 
FIND ME ONLINE
Guitarist
Pianist
Roller Skater
Metaphysics
Writing about anything that 
captures my interest
Sketching
https://github.com/urvashi912
https://www.linkedin.com/in/urvashi-
agrawal-12b97623a/
All the students from all the years
participated into this hackthon. My team
had only freshers including me and we
made it to the top 23!
The idea for this hackthon was to how we
can leverage technology for Women
Safety.
Achieved 23rd place out of over
200teams in the SIH BU Hackathon
held at my University.
Got Selected for the Tech Team in
my University's GFG club out of over
200 registrations, where only top 3
members were meant to be
selected
2023 - 2027 
Github
LinkedIn
www.enhancv.com
 
Powered by
 
 
 
WON $10k in a Hedera
International Hackthon
 
Won the first prize in Microsoft AI
Innovation hackthon among 280+
teams participating
 


In [130]:
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
     google_api_key="AIzaSyBQ9QBAQvK78AVTeOE-msWVDJs3YuObT40"
)


In [131]:
vector = embeddings.embed_documents("hello, world!")
print(len(vector))
print(vector[0])

13
[0.029026422649621964, -0.01834726519882679, -0.07433483004570007, -0.008478997275233269, 0.07129383087158203, 0.01129910908639431, 0.02824152261018753, -0.021319091320037842, 0.003001588862389326, 0.03199738636612892, -0.019613198935985565, 0.0349922701716423, -0.03391840681433678, 0.03200799971818924, -0.003912163432687521, -0.004259414970874786, 0.004753096494823694, 0.010103949345648289, 0.04085555672645569, -0.02268749102950096, 0.015761787071824074, 0.002574122743681073, -0.00962142925709486, 0.0002575013495516032, 0.031116677448153496, 0.0028279172256588936, -0.016650434583425522, -0.056548845022916794, -0.05027077719569206, -0.0011164209572598338, -0.07096167653799057, 0.0013611518079414964, -0.06286182999610901, 0.015642981976270676, -0.00445942860096693, -0.031905051320791245, -0.015190227888524532, 0.0029025282710790634, -0.0047320113517344, 0.03807353973388672, 0.015366894192993641, -0.018328066915273666, -0.03473326191306114, 0.004799453541636467, 0.012583659030497074, 