<a href="https://colab.research.google.com/github/wajihh/genai_projects/blob/main/langchain_rag_qdrant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Development of RAG QA Bot With LangChain, Qdrant, and OpenAI**

  *Acknowledgement: https://medium.com/data-science-at-microsoft/how-i-built-a-superhero-facts-rag-qa-bot-using-langchain-qdrant-and-openai-20a04202c6b1*
  *By: Deepsha Menghani*

#**What is RAG?**

Retrieval-Augmented Generation (RAG) has emerged as a powerful technique in the world of AI for improving the response accuracy of the AI language model. By combining the generative power of these models with the ability to retrieve relevant information from a knowledge base, RAG systems can provide more accurate, contextual, and factual responses.

##Set up the envoirnment

In [1]:
!pip install langchain langchain_openai langchain_community langchain_qdrant python-dotenv qdrant-client



##Connect with google drive

In [2]:
# Connect with google drive

from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#Connect with OpenAI
Set up an account on the OpenAI Platform and generate a unique API key. Then, I store this API key in my .env file as OPENAI_API to access it.

Next, a client that can interact with OpenAI’s language models. This is where the ChatOpenAI class from LangChain comes into play.

In [3]:
import os
from dotenv import load_dotenv

# Load OpenAI and Qdrant API keys from .env and .env.qdrant files
load_dotenv("/content/drive/MyDrive/Colab Notebooks/.env")  # For OpenAI
openai_api_key = os.getenv("OPENAI_API_KEY")

load_dotenv("/content/drive/MyDrive/Colab Notebooks/.env.qdrant")  # For Qdrant
qdrant_url = os.getenv("QDRANT_URL")
qdrant_api_key = os.getenv("QDRANT_API_KEY")

# Set OpenAI API key
import openai
openai.api_key = openai_api_key

# Check if the keys are loaded correctly
print(f"OpenAI API Key Loaded: {openai_api_key is not None}")
print(f"Qdrant URL Loaded: {qdrant_url}")


OpenAI API Key Loaded: True
Qdrant URL Loaded: https://a53aab87-4d9d-4775-b259-285e3a5cefb5.europe-west3-0.gcp.cloud.qdrant.io


In [4]:
from langchain_openai import ChatOpenAI
#llmclient = ChatOpenAI(openai_api_key=openai_api_key)
llmclient = ChatOpenAI(openai_api_key=openai_api_key, model_name="gpt-3.5-turbo", temperature=0.2, max_tokens=100)


# Selection of Model and Parameters

ChatOpenAI() creates an instance of the ChatOpenAI class, which can interact with OpenAI’s chat models.
By default, this uses the gpt-3.5-turbo model, which can be updated by adding a model_name parameter, e.g., model_name="gpt-4".
Additional parameters include the following:
temperature: Controls randomness in outputs, on a scale from 0 to 1. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more focused and deterministic.
max_tokens: The maximum number of tokens to generate in the chat completion.
request_timeout: How many seconds to wait for the request to complete before timing out.

In [5]:
# Choose Model Parameters as per requirement

import os
from dotenv import load_dotenv
import openai
from langchain_openai import ChatOpenAI
# ##Create a loop for user input questions for chatbot
# ##Set up the envoirnment
!pip install langchain langchain_openai langchain_community langchain_qdrant python-dotenv qdrant-client
# ##Connect with google drive

# Load OpenAI and Qdrant API keys from .env and .env.qdrant files
load_dotenv("/content/drive/MyDrive/Colab Notebooks/.env")  # For OpenAI
openai_api_key = os.getenv("OPENAI_API_KEY")

load_dotenv("/content/drive/MyDrive/Colab Notebooks/.env.qdrant")  # For Qdrant
qdrant_url = os.getenv("QDRANT_URL")
qdrant_api_key = os.getenv("QDRANT_API_KEY")

# Set OpenAI API key
openai.api_key = openai_api_key

# Check if the keys are loaded correctly
print(f"OpenAI API Key Loaded: {openai_api_key is not None}")
print(f"Qdrant URL Loaded: {qdrant_url}")

llmclient = ChatOpenAI(openai_api_key=openai_api_key, model_name="gpt-3.5-turbo", temperature=0, max_tokens=100)

OpenAI API Key Loaded: True
Qdrant URL Loaded: https://a53aab87-4d9d-4775-b259-285e3a5cefb5.europe-west3-0.gcp.cloud.qdrant.io


# Load Document
The key differentiating component of a RAG system is the knowledge base from which information needs to be retrieved. Inthis knowledge base is stored in a text file. Langchain provides multiple types of loaders for different file types. In this case TextLoader is imported.

TextLoader("name--.txt") creates an instance of the TextLoader class, specifying the path to the text file.
loader.load() loads the content of the file into memory and returns a list of Document objects with attributes page_content and metadata (aka payload).
By loading data into this structured format, it is being preparedfor the next steps in the RAG system, such as splitting the text into chunks and creating embeddings.

In [6]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("/content/drive/MyDrive/2024_Advance_AI/Data/superhero_facts.txt")
documents = loader.load()
documents = loader.load()


# Chunk it up
Chunking is a technique used in Generative AI to handle large amounts of data by splitting it into smaller, more manageable pieces. Chunking helps with:

Efficient processing: Smaller chunks of text are easier and faster to process, especially when creating embeddings.

Relevant retrieval: It allows for more precise retrieval of information by fetching specific, relevant chunks rather than entire documents.

Context management: It helps in maintaining appropriate context size for the language model.

Adjust parameters with CharacterTextSplitter; this class from LangChain splits text based on character count. Parameters of CharacterTextSplitter include:

separator = “\n”: This tells the splitter to try to break chunks at newline characters. It’s useful for preserving the structure of input file, where each fact is separated by a new line.The separator could be changed to split on different characters (e.g., “. ” to split on sentences) depending on the structure of the data.

chunk_size = 200: This sets the target size for each chunk to 200 characters. The separator defines that when this limit is reached, find the next \n or enter key to close out the chunk. Increasing this would create larger chunks, which might preserve more context but could lead to less precise retrieval. In large-scale scenarios this can have a more visible impact.

chunk_overlap = 0: This means there’s no overlap between chunks. Each character from the original text appears in exactly one chunk. It can be adjusted based on the specific needs and nature of the data. For instance, if I wanted some text to carry forward from one chunk to the next because the continuity of it is as important as the content itself, then I would enter a higher overlap number.

In [7]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
  separator = "\n",
  chunk_size = 200,
  chunk_overlap = 10
)

loader = TextLoader("/content/drive/MyDrive/2024_Advance_AI/Data/superhero_facts.txt")
documents = loader.load_and_split(
  text_splitter=text_splitter
)

# Read the output of chunking part by printing it out:

In [8]:
for doc in documents:
  print(doc.page_content)
  print("\n")

1. Ant-Man (Hank Pym) discovered subatomic particles known as "Pym Particles."
2. Ant-Man created serums that could shrink or grow objects and people.


3. Ant-Man developed a cybernetic helmet to communicate with ants.
4. Ant-Man battled various villains like Egghead using his size-changing abilities.


5. Ant-Man is a founding member of the Avengers alongside Wasp.
6. Aquaman's real name is Arthur Curry, also known as Orin.
7. Aquaman is the telepathic ruler of Atlantis and the Earth's oceans.


8. Aquaman has superhuman strength, speed, and the ability to command sea life.
9. Aquaman is a founding member of the Justice League of America.
10. Aquaman's most consistent nemesis is Black Manta.


11. Bane was born and raised in the Peña Duro prison in Santa Prisca.
12. Bane developed extraordinary physical and mental skills while imprisoned.


13. Bane became a test subject for a drug called Venom, which enhanced his strength.
14. Bane is famous for breaking Batman's back in the "Knight

# Embeddings and vector database setup

Embeddings convert text into numerical vectors that capture semantic meaning for more efficient similarity searches. For this projec OpenAI’s embeddings and Qdrant are used as our vector database.

These are the steps I followed in the code:

1. Creating embeddings: The OpenAIEmbeddings class handles the API calls to OpenAI’s embedding endpoint.

2. Setting up Qdrant client: Qdrant is a vector similarity search engine. It is used to store and retrieve the embeddings efficiently. Here QdrantClient is created that stores data locally in an embeddings directory. This local storage approach is great for development and smaller datasets. For production or larger datasets, this might not be feasible, other approaches like cloud offerings or servers shall be required.

3. Creating or accessing the collection: Check to see whether the collection already exists. If not, create it with specific vector parameters. Set size=1536 in VectorParams corresponding to the dimensionality of OpenAI’s embeddings. Here cosine similarity is used for vector comparisons, which is effective for measuring the similarity between two vectors regardless of magnitude.

4.  Initializing the Qdrant vector store: Initialize the Qdrant vector store with the client, collection name, and embeddings. This step connects LangChain’s Qdrant integration with Qdrant client and OpenAI embeddings. It creates a db object that I use to add documents and perform similarity searches.

5.  Adding documents to the vector store: This converts each text chunk into an embedding and stores it in the Qdrant collection.

In [9]:
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

# Provide openai_api_key as a keyword argument
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# Embeddings stored locally in drive folder
# client = QdrantClient(path="./embeddings")
# Instead of using local Qdrant, connect to your Qdrant server:
client = QdrantClient(url=qdrant_url, api_key=qdrant_api_key) # Use qdrant_url and qdrant_api_key from your environment
collection_name = "superhero_embeddings"
if collection_name not in [c.name for c in client.get_collections().collections]:
  client.create_collection(
  collection_name=collection_name,
  vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

db = Qdrant(client=client, collection_name=collection_name, embeddings=embeddings)
db.add_documents(documents)

  db = Qdrant(client=client, collection_name=collection_name, embeddings=embeddings)


['dda7824b5bb74b4f9f0a57eec147c508',
 '804087fb82234d9899a2dc366bf42baa',
 '0adea64c5293427cb52fb70578a99718',
 '00d28b4bee7f4d069032dec6a3096cff',
 '8e0ca55b8e014655ac545386de4be788',
 'bd02e163f3f84fd08f627e5893ff67cc',
 '679c13d674ad4b9c9ff55053a1ecd711',
 '83d546b3548a42a693e515dab4908b96',
 'eecfdc6326cd4d74a521ff3090ac102a',
 'b8e45d1203324e1391720a709c462f98',
 '8a08deb1fd344ec4873529aaab80579e',
 '49d230b7e7414a9eafff27fd96a587da',
 '0e5a3d6b70234d058d259f6f41a08f9a',
 '19bac8e0edd6445b8764b581111f7e53',
 '5c23cb06485745768f1ea2775ecdef17',
 '44b819d55a8f44b7b822a5489a86b74f',
 '95179a18ea004f83a742028797e9bb93',
 '303f85f13c984d3c958c0880b2291ce0',
 '09cd96859f074331a01457f524c8d443',
 '973b819df54b4fb497b838ac083262d5',
 '006156f5e31c49eabf26310352a1e690',
 '64aa8bf275bd4da68b059018127125dd',
 '6f5140fc51c646a48c9c487e160101f0',
 '3751770b5dbf4bbe9fedbffaaf15a045',
 'd16faa2ce36e4266b27976f2cbc241b7',
 '3720ade0fdd4449f92a9c3d693fdb6a4',
 '369819cf2b51480a9ba0eb5eff8a6822']

#Create the Retrieval QA Chain
The Retrieval QA Chain is where I will bring together the retrieval mechanism through vector store and the language model to create a system that can answer questions based on knowledge base:

1.  Setting up the retriever: First, convert the Qdrant vector store into a retriever object. The retriever is responsible for fetching the most relevant documents from the vector store based on a given query. There are additional parameters that you can play around with, such as search_type (e.g., “similarity”, “mmr” for maximum marginal relevance) and search_kwargs (e.g., to specify the number of documents to retrieve).

2.  Creating the RetrievalQA Chain: Now use the RetrievalQA class from LangChain and create a chain specifying the language llm, retriever, and chain_type. The chain type specifies how to process the retrieved documents.


In [10]:
from langchain.chains import RetrievalQA

retriever = db.as_retriever()

chain = RetrievalQA.from_chain_type(
  llm=llmclient,
  retriever=retriever,
  chain_type="stuff"
)

The “stuff” chain type means that all retrieved documents are stuffed into the prompt sent to the language model. While this is suitable for smaller sets of retrieved documents, other chain types (like “map_reduce”, “refine”, or “rerank”) might be more suitable for other scenarios. For instance, “map_reduce” processes each document separately, then combines the results, while “refine” iteratively updates the answer with each document and “rerank” first applies the language model to each retrieved document individually, ranks these processed documents based on their relevance to the question, and then selects the highest-ranked result as the final answer.

# Update the while loop

Update the while loop to invoke the chain and get the results to test them out.

In [None]:
while True:
  humaninput = input(">> ")
  result = chain.invoke(humaninput)
  print(result['result'])

>> describe ant man in detail
Ant-Man is a superhero in the Marvel Universe who has the ability to shrink down to the size of an ant while retaining his full strength. His real name is Hank Pym, a brilliant scientist who developed the Pym Particles that allow him to change size. He also created a cybernetic helmet that enables him to communicate with and control ants. Ant-Man has used his size-changing abilities to battle various villains, such as Egghead, and has been a member of superhero teams like the Avengers
>> which super heros are mentioned?
The superheroes mentioned in the context are Captain America 2099 and Captain America (Venomized).
>> describe their adventures
After witnessing her father's death while defending against Parademons, Batwoman teamed up with her friend Supergirl. Together, they were unexpectedly transported to another universe through a Boom Tube. In this new universe, they faced various challenges and enemies while trying to find a way back home. Their adve

#**Next Steps**

RAG system using LangChain and Qdrant has been developed that can load documents, store them efficiently in a vector database, and retrieve relevant information to answer user queries. Important aspects:

1.  Semantic search: Converting text chunks into embeddings enables semantic search capabilities. This means it can find relevant information based on meaning, and not just keyword matching.

2.  Efficiency: Vector databases like Qdrant are optimized for fast similarity searches in high-dimensional spaces, which is crucial for quick retrieval in RAG systems.

3.  Scalability: This setup can handle growing amounts of data efficiently, making it suitable for both small- and large-scale applications; however, you may need to play with parameters and storage based on the scale of your scenario.

4.  Flexibility: By using LangChain’s abstractions, I can easily switch to different embedding models or vector stores if needed in the future.

The power of this approach is truly unlocked in real-world scenarios. The approach followed can be applied to any domain-specific knowledge base, from customer support to educational tools. For instance:

In healthcare, it could assist doctors by quickly retrieving relevant medical literature for specific patient cases.

In legal firms, it could help lawyers find pertinent case law and precedents for ongoing litigation.

In e-commerce, it could enhance product recommendations by understanding nuanced customer queries.

In financial services, it could provide personalized investment advice based on vast amounts of market data and individual client profiles.

As you continue to explore RAG systems, consider experimenting with different document types, larger datasets, and more complex chunking and retrieval strategies. This notebook can you get started. Happy coding!



#Resources
1.  https://github.com/deepshamenghani/langchain_rag_qdrant
2.  LangChain RAG guide
3.  OpenAI API Cook Book
4.  Qdrant Tutorials-Vector Database Documentation
5.  Hugging face RAG guide.