In [17]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader=TextLoader('data/example.txt')
documents=loader.load()

splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=200)
chunks=splitter.split_documents(documents)

EMBEDDING MODEL

In [18]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

In [19]:
print(len(chunks))

138


EMBEDDINGS CHUNKS INTO VECTOR_DB
FAISS(Facebook AI Similarity Search)

In [None]:
from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(chunks, embedding=embeddings)

In [None]:
vectorstore.save_local('data/faiss_index')

In [None]:
vectorstore=FAISS.load_local('data/faiss_index',embeddings,allow_dangerous_deserialization=True)

In [None]:
print(vectorstore)

In [None]:
query = 'what is chatgpt'
res=vectorstore.similarity_search(query,k=3)
print(res)

In [24]:
for i in range(len(res)):
    print(i+1,res[i].page_content)

1 Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way, that a JPEG retains much of the information of a higher-resolution image, but, if you're looking for an exact sequence of bits, you won't find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it's usually acceptable. [...] It's also a way to understand the
2 ChatGPT is a conversational chatbot and artificial intelligence assistant based on large language models.[28] It can write and debug computer programs;[29] compose music, teleplays, fairy tales, and student essays; answer test questions (sometimes, depending on the test, at a level above the average human test-taker);[30] generate business ideas;[31] write poetry and song lyrics;[32] translate and summarize text;[33] simulate a Linux system; simulate entire chat rooms; or play games like
3 Countries

LangChain Retrival Chain

In [26]:
from dotenv import load_dotenv
load_dotenv()

True

In [40]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    temperature=0.7,
    model="gpt-4o-mini"
    
)
qa_chain = RetrievalQA.from_llm(
    llm, retriever=vectorstore.as_retriever(),return_source_documents=True
)
response=qa_chain.invoke({'query':'what is the use of chatgpt'})

In [41]:
response

{'query': 'what is the use of chatgpt',
 'result': 'ChatGPT can be used for a variety of purposes, including:\n\n1. Writing and debugging computer programs.\n2. Composing music, teleplays, fairy tales, and student essays.\n3. Answering test questions, sometimes at a level above the average human test-taker.\n4. Generating business ideas.\n5. Writing poetry and song lyrics.\n6. Translating and summarizing text.\n7. Simulating a Linux system.\n8. Simulating entire chat rooms.\n9. Playing games.\n\nOverall, it serves as a conversational chatbot and AI assistant that can create human-like responses in text, speech, and images.',
 'source_documents': [Document(id='4400fcc0-d860-4539-b00b-f56ea7774ace', metadata={'source': 'data/example.txt'}, page_content="Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way, that a JPEG retains much of the information of a higher-resolution image, but, if you're looking for an exact se

In [43]:
print(response['result'])

ChatGPT can be used for a variety of purposes, including:

1. Writing and debugging computer programs.
2. Composing music, teleplays, fairy tales, and student essays.
3. Answering test questions, sometimes at a level above the average human test-taker.
4. Generating business ideas.
5. Writing poetry and song lyrics.
6. Translating and summarizing text.
7. Simulating a Linux system.
8. Simulating entire chat rooms.
9. Playing games.

Overall, it serves as a conversational chatbot and AI assistant that can create human-like responses in text, speech, and images.


In [60]:
for doc in response['source_documents']:
    print("page_content :"+doc.page_content)
    print("page_source : "+doc.metadata['source'])
    print("-------------")

page_content :Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way, that a JPEG retains much of the information of a higher-resolution image, but, if you're looking for an exact sequence of bits, you won't find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it's usually acceptable. [...] It's also a way to understand the
page_source : data/example.txt
-------------
page_content :ChatGPT is a conversational chatbot and artificial intelligence assistant based on large language models.[28] It can write and debug computer programs;[29] compose music, teleplays, fairy tales, and student essays; answer test questions (sometimes, depending on the test, at a level above the average human test-taker);[30] generate business ideas;[31] write poetry and song lyrics;[32] translate and summarize text;[33] simulate a Lin

USING CHROMADB FOR VECTOR SIMILARITY SEARCH

Loading pdf to docs

In [2]:
from langchain.document_loaders import PyPDFLoader
loader=PyPDFLoader('data/ai_example.pdf')
docs=loader.load()


pdf loaded to docs


Chunks Generation

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=200)
chunks=splitter.split_documents(docs)

Chunks are generated They are 177


For Embedding Using HuggingFace

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

Hugging face embeddings were using


Chroma db setup

In [5]:
from langchain_chroma import Chroma
vector_store=Chroma(collection_name="example_collection",
                    embedding_function=embeddings,
                    persist_directory='data/chroma_langchain_db')

Generating unique id for each doc to represent

In [7]:
from uuid import uuid4
uuids=[str(uuid4()) for _ in range(len(chunks))]

Adding chunks and id to vextor_db

In [8]:
vector_store.add_documents(documents=chunks,ids=uuids)

['20e48c8d-8d23-4f7e-a9df-dba93c70d6a7',
 '476b2987-68ae-4339-872b-84e302c20e1f',
 'ce05ad3a-e4c3-406f-9a4b-3027b245f47e',
 'd2cff11c-bff6-4c80-8385-91fad0f72739',
 '48082439-bed5-4b53-8903-3ad57e7127cf',
 'df859ee2-c1ad-4cd1-aef8-050fcaede0aa',
 '1982dfb4-681f-4847-96d3-9b9553907612',
 '2a8a6b95-9218-4b25-b847-9f2f885e213e',
 'efe987a7-4de0-4db0-8847-45594dfd29bb',
 'aa7371d8-c0d8-4bcb-9d09-0521ae9c5804',
 '80f31ac5-9775-46b2-8fe8-1a06804c4f95',
 '282dde8c-8696-44a3-9fd1-f589a830252a',
 '8a98edd5-b315-4a96-b575-3a5435852bdd',
 'f3473a5f-eeae-465a-abb3-de5bca391c2a',
 'cc4bb652-26f4-4215-83c6-138a3ecf1338',
 '4c7c3d7b-640e-4360-90c0-ee9179f38755',
 'd2616945-e360-4796-a4c8-009ef164df78',
 '6eded412-4738-4d22-8e5c-206621179adc',
 'abc2a08a-69e5-40cb-9d58-14f72f11f070',
 '368a1f7a-032f-4209-a565-cf6ce9b38c90',
 '089c71c7-372d-4c44-bc49-fa876dc62979',
 '5f81504c-fb9d-4bb6-8f39-8182abbe5491',
 '8669295b-f85f-4ffe-a3d5-f2120996b57f',
 '2c051f1c-cc35-478c-8ed5-f5b750c67fef',
 'e9708e8e-4149-

Similary search -query 

In [9]:
results=vector_store.similarity_search('What is ai',k=3)
print(results)

[Document(id='20e48c8d-8d23-4f7e-a9df-dba93c70d6a7', metadata={'creator': 'Microsoft® Word 2013', 'page_label': '1', 'page': 0, 'producer': 'Microsoft® Word 2013', 'moddate': '2024-11-18T12:03:19+05:30', 'creationdate': '2024-11-18T12:03:19+05:30', 'total_pages': 34, 'source': 'data/ai_example.pdf', 'author': 'acer'}, page_content='ARTIFICIAL INTELLIGENCE \n1 | P a g e   \n \n Artificial Intelligence \nDefinition:  \nIn simple terms, we can define AI as a machine that can simulate human thought \nprocess and can take actions based on those thoughts and even draw  \nconclusions. It should also be able to correct itself, if it makes a mistake. This \nalso means that AI based computer would be able to make a decision in a given \nsituation like human beings and in some cases even better \nAccording to Niti Aayog:'), Document(id='82f6ccd6-b90c-402b-92c3-15d3f8b3cd83', metadata={'total_pages': 34, 'source': 'data/ai_example.pdf', 'moddate': '2024-11-18T12:03:19+05:30', 'producer': 'Microsof

In [16]:
for res in results:
    print(res.page_content+"\n"+"source :"+res.metadata['source'])

ARTIFICIAL INTELLIGENCE 
1 | P a g e   
 
 Artificial Intelligence 
Definition:  
In simple terms, we can define AI as a machine that can simulate human thought 
process and can take actions based on those thoughts and even draw  
conclusions. It should also be able to correct itself, if it makes a mistake. This 
also means that AI based computer would be able to make a decision in a given 
situation like human beings and in some cases even better 
According to Niti Aayog:
source :data/ai_example.pdf
may contribute to social isolation and disconnection by reducing face -to-face interactions and 
interpersonal relationships. 
 
AI, ML & DL 
AI-Artificial Intelligence (AI) Refers to any technique that enables computers to 
mimic human intelligence. It gives the ability to machines to recognize a 
human’s face; to move and manipulate objects; to understand the voice 
commands by humans, and also do other tasks. The AI-enabled machines
source :data/ai_example.pdf
also means that AI based c