## Building a RAG System with LangChain and Chroma DB
Introduction

Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of lange language models with external knowledge retrieval.  
This notebook will walk you through building a complete  RAG system using

- LangChain: A framework for developing applications  powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- Open AI: For Embeddings and language model 

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
## langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain.schema import Document
from langchain_community.vectorstores import Chroma
import numpy as np
from typing import List

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# RAG Architecture overview
print("""
RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge
""")


RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge



# Sample Data

In [4]:
## create sample documents
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these networks and consists of layers of interconnected 
    nodes. Deep learning has revolutionized computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]

sample_docs


['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    ',
 '\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processin

In [5]:
## Save sample documents to file
import tempfile
temp_file = "/data/oybek/PKD/xudamadarsam/mashq/vector/doc"
# temp_file = tempfile.mkdtemp()
for i, doc in enumerate(sample_docs):
    with open(f"{temp_file}/doc_{i}.txt", "w") as f:
        f.write(doc)
print(temp_file)

/data/oybek/PKD/xudamadarsam/mashq/vector/doc


In [6]:
temp_file

'/data/oybek/PKD/xudamadarsam/mashq/vector/doc'

## Document Loading

In [7]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader
loader = DirectoryLoader(
    "/data/oybek/PKD/xudamadarsam/mashq/vector/doc",
    glob='*.txt',
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}
)
documents = loader.load()

print(f"Loaded {len(documents)} documents")
print(f"\nFirst document preview:")
print(documents[2].page_content[:200] + "...")

Loaded 3 documents

First document preview:

    Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. Ther...


In [8]:
documents

[Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at sequential data processing.\n    '),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_2.txt'}, page_content='\n    Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine tra

In [9]:
sample_docs[0]

'\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    '

## Document splitting

In [10]:
# Initialize text splitter
text_splitter  = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len,
    separators=[" "]
)

chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 5 chunks from 3 documents

Chunk example:
Content: Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspire...
Metadata: {'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}


In [11]:
chunks

[Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Networks (RNNs) and Transformers \n    excel at sequential data processing.'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computer

## Embedding models

In [12]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


In [13]:
sample_text  = "Machine learning is fascinating"
embedding = OpenAIEmbeddings()
embedding

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x7fabe0c88830>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7fabe0c89160>, model='text-embedding-ada-002', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [14]:
vector = embedding.embed_query(sample_text)
vector

[-0.02645929902791977,
 0.012394296936690807,
 0.011785591021180153,
 -0.017730191349983215,
 -0.00440340768545866,
 0.018610872328281403,
 0.0003296484937891364,
 0.01021849550306797,
 -0.01793741062283516,
 -0.041029397398233414,
 0.00979110598564148,
 0.03447609022259712,
 -0.015696853399276733,
 -0.011578371748328209,
 -0.0019766767509281635,
 0.009104692377150059,
 0.04157334938645363,
 0.009965947829186916,
 0.0008418279467150569,
 -0.011630176566541195,
 -0.030305804684758186,
 0.02052764967083931,
 -0.0036587135400623083,
 -0.031859949231147766,
 -0.021058648824691772,
 0.0056596738286316395,
 0.0018455458339303732,
 -0.03525316342711449,
 -0.012607991695404053,
 -0.007828999310731888,
 0.026485200971364975,
 -0.007582926657050848,
 -0.01756182499229908,
 -0.04128842055797577,
 -0.008185157552361488,
 -0.014272221364080906,
 -0.005853941664099693,
 0.001787265413440764,
 -0.016849510371685028,
 -0.002315026707947254,
 0.02227606251835823,
 0.024179888889193535,
 -0.007051927503

In [15]:
len(vector)

1536

## Initialize the ChromaDb Vector store and store the chunks in Vector Representation

In [16]:
# Create a ChromaDB vector store

persist_dir = "./chroma_db"

# Initialize ChromaDB with OpenAI embedding
vectore_store = Chroma.from_documents(
    documents=chunks,
    embedding=OpenAIEmbeddings(),
    persist_directory=persist_dir,
    collection_name="rag_collection"
)



In [17]:
vectore_store._collection.count()

35

## Test Similarity Search

In [18]:
query = "What are the types of machine learning?"
similar_docs = vectore_store.similarity_search(query, k =3)
similar_docs

[Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinfo

In [19]:
query = "What is deep learning?"
similar_docs = vectore_store.similarity_search(query, k =3)
similar_docs

[Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n

In [20]:
print(f"Query: {query}")
print(f"\n Top {len(similar_docs)} similar chunks")
for i, doc in enumerate(similar_docs):
    print(f"\n -- Chunks {i+1} ---")
    print(doc.page_content[:200] + "...")
    print(f"Source:m{doc.metadata.get("source", "Unknown")}")

Query: What is deep learning?

 Top 3 similar chunks

 -- Chunks 1 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these networks and consists of layers of interco...
Source:m/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt

 -- Chunks 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these networks and consists of layers of interco...
Source:m/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt

 -- Chunks 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these networks and consists of layers of interco...
Source:m/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt


## Advanced Similaity Search with Score

In [21]:
result_score = vectore_store.similarity_search_with_score(query, k=3)
result_score

[(Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at'),
  0.2371392846107483),
 (Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vi

`--------------------------------------------------------------------------------------------------`

## The Similarity score represents how closely related a document chunk is to your query. The scoring depends on the distance metric used:
# ChromaDB Default: Uses L2 distance (Euclidean distance)
- Lower socres == More similar (closer in vector space)
- Score of 0 == identical vectors
- Typical range: 0 to 2 (but can be higher)
# Cosine similarity (if configured):
- Higher scores == More similar
- range -1 to 1 (1 being identical)

`-------------------------------------------------`. `------------------------------------`

# initialize LLm RAG Chain, Prompt Template, Query the RAG system

In [22]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model_name = "gpt-3.5-turbo"
)

In [23]:
test1 = llm.invoke("What is Large Language model")
test1.content

'Large Language Models (LLMs) are advanced natural language processing models that are capable of understanding and generating human-like text. These models are trained on massive amounts of data and use powerful neural networks to process and generate text. LLMs have become increasingly popular in recent years due to their ability to perform a wide range of language-related tasks, such as text generation, language translation, and sentiment analysis. Examples of large language models include GPT-3 (Generative Pre-trained Transformer 3) and BERT (Bidirectional Encoder Representations from Transformers).'

In [24]:
from langchain.chat_models.base import init_chat_model
llm = init_chat_model('openai:gpt-3.5-turbo')
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7fabe0027b10>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7fabde44ead0>, root_client=<openai.OpenAI object at 0x7fabde449e00>, root_async_client=<openai.AsyncOpenAI object at 0x7fabde44ab10>, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [25]:
llm.invoke("What is AI")

AIMessage(content='AI, or artificial intelligence, refers to the simulation of human intelligence processes by machines, especially computer systems. This includes learning, reasoning, problem-solving, perception, and language understanding. AI technologies are used to develop systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 68, 'prompt_tokens': 10, 'total_tokens': 78, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-CNZjmKehaxKRiseFybZRxvionjt2E', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--35fe9b04-7e09-47c7-a

In [26]:
s = llm.invoke("how do think about AGI")
s.content

'As an AI, I believe that AGI (Artificial General Intelligence) has the potential to greatly benefit society by revolutionizing industries, solving complex problems, and advancing technology. However, there are also ethical concerns and potential risks associated with AGI, such as job displacement, loss of privacy, and misuse of power. It is important for researchers and policymakers to carefully consider these implications and ensure that AGI is developed and used responsibly.'

## Modern RAG Chain

In [25]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [26]:
## Convert vector store to retiever

retriever = vectore_store.as_retriever(
    search_kwarg = {"k":3}
)

retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7fabe0c896a0>, search_kwargs={})

In [27]:
## Create a prompt remplate
from langchain_core.prompts import ChatPromptTemplate

system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

In [28]:
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

`----------------------------------------------------------`

##### What is create_stuff_documents_chain?
`create_stuff_documents_chain` creates a chain that "stuffs" (inserts) all retrieved documents into a single prompt and sends it to the LLM. It's called "stuff" because it literally stuffs all the documents into the context window at once.

In [29]:
## Create a document chain
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7fabe0027b10>, async_client=<openai.resourc

`>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>`

### `This chain:`
- Takes retrievel documents
- `Stuffs` them into the prompts placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response

What is create_retrieval_chain?
create_retrieval_chain is a function that combines a retriever (which fetches relevant documents) with a document chain (which processes those documents with an LLM) to create a complete RAG pipeline.

In [30]:
### Create The Final RAG Chain
from langchain.chains import create_retrieval_chain
rag_chain = create_retrieval_chain(retriever, document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7fabe0c896a0>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't kn

In [31]:
response = rag_chain.invoke({"input":"What is Deep learning"})
response

{'input': 'What is Deep learning',
 'context': [Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at'),
  Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has rev

In [32]:
response['answer']

'Deep learning is a subset of machine learning that uses artificial neural networks inspired by the human brain. It involves interconnected layers of nodes that enable the model to learn complex patterns and representations. Deep learning has made significant advancements in various fields such as computer vision, natural language processing, and speech recognition.'

In [33]:
## function to query the modern RAG system
def query_rag_modern(question):
    print(f'Question: \033[91m{question}\033[0m')
    print("-"* 50)

    # Using create_retrievel_chain approach
    result = rag_chain.invoke({'input': question})
    print(f"\033[32mAnswer!\033[0m: \033[34m{result['answer']}\033[0m")
    for i, doc in enumerate(result['context']):
        print(f"\n--Source {i+1}--")
        print(doc.page_content[:200]+ "...")
    return result

test_q  = [
    "What are the three types of machine learning?",
    "What is deep learning and how does it relate to neural networks?",
    "What are CNNs best used for?"
]

for q in test_q:
    result = query_rag_modern(q)
    print("\n" + "="*80 + "\n")

Question: [91mWhat are the three types of machine learning?[0m
--------------------------------------------------
[32mAnswer![0m: [34mThe three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through a trial-and-error process.[0m

--Source 1--
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--Source 2--
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--Source 3--
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    

## Create RAG Chain alternative -Using LCEL ( Langchain expression Language)

In [34]:
#  more flexible approach using LCEL
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel


In [35]:
# Create a custom prompt
custom_prompt = ChatPromptTemplate.from_template("""Use the following context to answer the question. 
If you don't know the answer based on the context, say you don't know.
Provide specific details from the context to support your answer.

Context:
{context}

Question: {question}

Answer:""")
custom_prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [36]:
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7fabe0c896a0>, search_kwargs={})

In [37]:
def format_doc(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [39]:
# Building the output documents for the prompt
rag_chain_icel = (
    {
        "context": retriever | format_doc,
        "question": RunnablePassthrough()
    }
    | custom_prompt
    | llm
    | StrOutputParser()
)
rag_chain_icel

{
  context: VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7fabe0c896a0>, search_kwargs={})
           | RunnableLambda(format_doc),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7fabe0027b10>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7fabde44ead0>, root_client=<o

In [42]:
responcse = rag_chain_icel.invoke("What is Deep learning")
responcse

'Deep learning is a subset of machine learning based on artificial neural networks. It involves interconnected layers of nodes inspired by the human brain and has revolutionized computer vision, natural language processing, and speech recognition.'

In [43]:
retriever.get_relevant_documents("What is Deep learning")

  retriever.get_relevant_documents("What is Deep learning")


[Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n

In [46]:
# Query using the LCEL approach - Fixed version
def query_rag_lcel(question):
    print(f"Question: {question}")
    print("-" * 50)
    
    # Method 1: Pass string directly (when using RunnablePassthrough)
    answer = rag_chain_icel.invoke(question)
    print(f"Answer: {answer}")
    
    # Get source documents separately if needed
    docs = retriever.get_relevant_documents(question)
    print("\nSource Documents:")
    for i, doc in enumerate(docs):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")

In [47]:
query_rag_lcel("What are the key concepts in reinforcement learning?")

Question: What are the key concepts in reinforcement learning?
--------------------------------------------------
Answer: The key concepts in reinforcement learning are learning through interaction with an environment, using rewards and penalties.

Source Documents:

--- Source 1 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 4 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....


In [48]:
query_rag_lcel("What is deeplearning learning?")

Question: What is deeplearning learning?
--------------------------------------------------
Answer: Deep learning is a subset of machine learning based on artificial neural networks inspired by the human brain. It consists of layers of interconnected nodes. Deep learning has revolutionized computer vision, natural language processing, and speech recognition.

Source Documents:

--- Source 1 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these networks and consists of layers of interco...

--- Source 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these networks and consists of layers of interco...

--- Source 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    The human brain inspires these 

In [49]:
query_rag_lcel("What is AGI")

Question: What is AGI
--------------------------------------------------
Answer: Based on the context provided, the term AGI (Artificial General Intelligence) is not mentioned. Therefore, I do not know what AGI is in relation to this context.

Source Documents:

--- Source 1 ---
Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recogn...

--- Source 2 ---
Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recogn...

--- Source 3 ---
Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recogn...

--- Source 4 ---
Natural Language Processing (NLP)

    NLP is a field 

## Add new Documents to existing Vector Store

In [50]:
vectore_store

<langchain_community.vectorstores.chroma.Chroma at 0x7fabe0c896a0>

In [55]:
# Add new documents to the existing vector store
new_document = """
Reinforcement Learning in Detail

Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and learns to maximize cumulative reward over time. Key concepts 
in RL include: states, actions, rewards, policies, and value functions. Popular RL 
algorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems.
"""

new_document

'\nReinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.\n'

In [52]:
chunks

[Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    The human brain inspires these networks and consists of layers of interconnected \n    nodes. Deep learning has revolutionized computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers \n    excel at'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_1.txt'}, page_content='Networks (RNNs) and Transformers \n    excel at sequential data processing.'),
 Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computer

In [57]:
new_doc = Document(
    page_content=new_document,
    metadata={"source": "manual_addition", "topic": "reinforcement_learning"}
)

# split the documents
new_chunks = text_splitter.split_documents([new_doc])
new_chunks

[Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been'),
 Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.')]

In [58]:
# Add new documents to vectorstore
vectore_store.add_documents(new_chunks)

['824b44f3-cfc5-4dcb-a0a7-aca7d7b84fd1',
 'e47f0610-7047-4178-ba72-2776d98567f7']

In [60]:
print(f"Added {len(new_chunks)} new chunks to vector store")
print(f"Total vectors now {vectore_store._collection.count()}")

Added 2 new chunks to vector store
Total vectors now 37


In [61]:
query_rag_lcel("What are the key concepts in reinforcement learning?")

Question: What are the key concepts in reinforcement learning?
--------------------------------------------------
Answer: The key concepts in reinforcement learning are interaction with an environment, rewards, and penalties.

Source Documents:

--- Source 1 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 4 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....


In [63]:
query_rag_lcel("Why Did Song Leave?")

Question: Why Did Song Leave??
--------------------------------------------------
Answer: I don't know.

Source Documents:

--- Source 1 ---
methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems....

--- Source 2 ---
Networks (RNNs) and Transformers 
    excel at sequential data processing....

--- Source 3 ---
Networks (RNNs) and Transformers 
    excel at sequential data processing....

--- Source 4 ---
Networks (RNNs) and Transformers 
    excel at sequential data processing....


In [65]:
new_document2 = """
Professor Iickho Song, a leading expert in communications and signal processing, 
retired from KAIST in February 2024 after 37 years and joined UESTC’s Institute
for Basic and Advanced Sciences in Chengdu. Despite KAIST’s program allowing retired
professors to continue research with 300 million won (about US$213,000) in annual funding,
Song chose UESTC, likely due to its stable research environment and resources, even though 
it faces U.S. sanctions limiting collaborations and technology imports."""

new_doc2 = Document(
    page_content=new_document2,
    metadata={"source": "manual_addition", "topic": "Professor Iickho Song To Chinese"}
)
new_chunks2 = text_splitter.split_documents([new_doc2])
new_chunks2

[Document(metadata={'source': 'manual_addition', 'topic': 'Professor Iickho Song To Chinese'}, page_content='Professor Iickho Song, a leading expert in communications and signal processing, \nretired from KAIST in February 2024 after 37 years and joined UESTC’s Institute\nfor Basic and Advanced Sciences in Chengdu. Despite KAIST’s program allowing retired\nprofessors to continue research with 300 million won (about US$213,000) in annual funding,\nSong chose UESTC, likely due to its stable research environment and resources, even though \nit faces U.S. sanctions limiting collaborations and technology'),
 Document(metadata={'source': 'manual_addition', 'topic': 'Professor Iickho Song To Chinese'}, page_content='sanctions limiting collaborations and technology imports.')]

In [66]:
vectore_store.add_documents(new_chunks2)

['ee56f665-b2cd-4d3b-9e38-6bb61a81bb91',
 '15b4c670-21fb-4423-8ee5-87460018be39']

In [68]:
print(f"Added {len(new_chunks2)} new chunks to the vector store")
print(f"Total vectors now: {vectore_store._collection.count()}")

Added 2 new chunks to the vector store
Total vectors now: 39


In [69]:
query_rag_lcel("Why Did Song Leave?")

Question: Why Did Song Leave?
--------------------------------------------------
Answer: Song left KAIST and joined UESTC likely due to the stable research environment and resources at UESTC. This can be inferred from the context where it is mentioned that despite KAIST's program allowing retired professors to continue research with funding, Song chose UESTC.

Source Documents:

--- Source 1 ---
Professor Iickho Song, a leading expert in communications and signal processing, 
retired from KAIST in February 2024 after 37 years and joined UESTC’s Institute
for Basic and Advanced Sciences in Che...

--- Source 2 ---
sanctions limiting collaborations and technology imports....

--- Source 3 ---
methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems....

--- Source 4 ---
Networks (RNNs) and Transformers 
    excel at sequential data processing....


`---------------------------------------------------------------------`

# Advanced RAG Techniques Conversational Memory
Understanding Conversational Memory in RAG Conversational memory enables the RAG system to maintain context across multiple interactions. This is crucial for:
Follow-up questions that reference previous answer Pronoun resolution (e.g., "it", "they", "that") Context-dependent queries that build on prior discussion Natural dialogue flow where users don't repeat context

Key Challenge: Traditional RAG retrieves documents based only on the current query, missing important context from the conversation. For example:

User: "Tell me about Python" Bot: explains Python programming language User: "What are its main libraries?" ← "its" refers to Python, but retriever doesn't know this

Solution: The modern approach uses a two-step process:

Query Reformulation: Transform context-dependent questions into standalone queries Context-Aware Retrieval: Use the reformulated query to fetch relevant documents



- `create_histiry_aware_retriever:` Makes the retriever understand conversation context
- `MessagesPlaceholder:`            Placeholder for chat history in prompts.
- `HumanMessage/AIMessage:`         Structured message types for conversation history

In [70]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

In [71]:
## Create a prompt that includes the chat history
contextualize_q_system_prompt =  """Given a chat history and the latest user question 
which might reference context in the chat history, formulate a standalone question 
which can be understood without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""


contextualize_q_prompt =  ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

In [73]:
history_awere_retriever = create_history_aware_retriever(
    llm,
    retriever,
    contextualize_q_prompt
)
history_awere_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7fabe0c896a0>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessa

In [74]:
# Create a new document chain with history
qa_system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

qa_propmt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ('human', "{input}")
]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_propmt)

# Create conversational RAG chain

connersational_rag_chain = create_retrieval_chain(
    history_awere_retriever,
    question_answer_chain
)


In [77]:
chat_history = []
# first question
res1 = connersational_rag_chain.invoke({
    "chat_history": chat_history,
    "input":"What is machine learning?"
})

print(f"Q: What is machine learning?")
print(f"A: {res1['answer']}")

Q: What is machine learning?
A: Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without explicit programming. It involves three main types: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through trial and error.


In [78]:
chat_history.extend([
    HumanMessage(content="what is machine learning"),
    AIMessage(content=res1["answer"])
])
chat_history

[HumanMessage(content='what is machine learning', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without explicit programming. It involves three main types: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through trial and error.', additional_kwargs={}, response_metadata={})]

In [79]:
## Second question
res2 = connersational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "what are its main types"
})
res2

{'chat_history': [HumanMessage(content='what is machine learning', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Machine learning is a subset of artificial intelligence that allows systems to learn and improve from experience without explicit programming. It involves three main types: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through trial and error.', additional_kwargs={}, response_metadata={})],
 'input': 'what are its main types',
 'context': [Document(metadata={'source': '/data/oybek/PKD/xudamadarsam/mashq/vector/doc/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning

In [80]:
res2['answer']

'The main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data to train models, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning learns through trial and error by interacting with an environment.'

In [82]:
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7fabe0027b10>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7fabde44ead0>, root_client=<openai.OpenAI object at 0x7fabde449e00>, root_async_client=<openai.AsyncOpenAI object at 0x7fabde44ab10>, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [85]:
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

In [86]:
from langchain_groq import ChatGroq
from langchain.chat_models import init_chat_model

In [87]:
llm_grok = ChatGroq(model="gemma2-9b-it")
llm_grok

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7faa363c81a0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7faa363c9160>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))