
## Building a Conversational Chatbot with your data using Ollama  

This notebook guides you through the creation of a chatbot tailored to your specific data needs. Utilizing HuggingFaceEmbeddings and FAISS, the project transforms documents into vectors for a local vector storage system. Then it integrates the "meta-llama/llama-2-7b-chat" model from your local machine. The `langchain` library plays a crucial role in this process, aiding in tasks like chunking documents, indexing data in vector db, managing conversation chains with memory buffers, and crafting prompt templates.


### Key Features:

- **PDF Content Processing**: When users upload PDF files, the notebook extracts the text, segments it into manageable chunks, and indexes these chunks in in a vector db locally using HuggingFaceEmbeddings and FAISS.
- **Data-Driven Query Handling**: Users can pose questions to the chatbot, which searches the indexed data for relevant answers.
- **Integrating Vector Database and LLMs**: We leverage `langchain`'s capabilities to link vector database indexing with llama-2 LLMs, enabling a seamless conversational experience with memory and retrieval functionalities.
- **Hallucination Check**: The notebook includes a mechanism to detect and correct any hallucinations or inaccuracies in the LLM's responses.

### Prerequisites for Running the Notebook:


1. **Library Requirements**: Confirm that you have installed all libraries specified in the `requirements (local rag).txt` file by `pip install -r requirements (local rag).txt`




Below cell imports the required libraries to run this notebook.

In [26]:

import PyPDF2
from langchain.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain_community.llms import Ollama

from langchain.embeddings import HuggingFaceEmbeddings # import hf embedding
from langchain.vectorstores import FAISS
from sentence_transformers import SentenceTransformer, util

from langchain.prompts import PromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain



### Enter your pdf file name below


### Step 1: Prepare above documents and their metadata
The prepare_docs function below processes a list of PDF documents by extracting text from each page and organizing it into two lists: one for the text content and another for the metadata (titles). It iterates through each page of each PDF, extracts the text, and forms a title using the PDF name and page number. The function returns these two lists, making it useful for indexing and referencing the content of multiple PDFs at a page level.

In [13]:
def prepare_and_split_docs(pdf_directory):
    # Load the documents
    loader = DirectoryLoader(pdf_directory, glob="**/*.pdf", show_progress=True, loader_cls=PyPDFLoader)
    documents = loader.load()

    # Initialize a text splitter
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=512,  # Use the smaller chunk size here to avoid repeating splitting logic
        chunk_overlap=256,
        disallowed_special=(),
        separators=["\n\n", "\n", " "]
    )

    # Split the documents and keep metadata
    split_docs = splitter.split_documents(documents)

    print(f"Documents are split into {len(split_docs)} passages")
    return split_docs


### Step 3: Ingest into Vector Database locally

The `ingest_into_vectordb` function is designed for processing and indexing a collection of documents into a vector database using FAISS (Facebook AI Similarity Search) for efficient similarity searches. It operates as follows:

1. **Embedding Creation**: It generates embeddings for the input documents (`split_docs`) using the Hugging Face model `'sentence-transformers/all-MiniLM-L6-v2'`. This model is specifically chosen for its efficiency in creating sentence-level embeddings and is set to run on the CPU.

2. **Vector Database Indexing**: Utilizes the generated embeddings to create a FAISS vector database. FAISS is used for its ability to efficiently handle large-scale similarity searches and clustering of dense vectors.

3. **Local Storage**: After creating the vector database, the function saves it locally to the path specified by `DB_FAISS_PATH`, ensuring the data can be easily accessed for future similarity searches or retrieval tasks.

The primary purpose of this function is to transform textual data into a structured, searchable vector format, facilitating efficient and scalable retrieval tasks such as document similarity searches or clustering.

In [15]:
def ingest_into_vectordb(split_docs):
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2', model_kwargs={'device': 'cpu'})
    db = FAISS.from_documents(split_docs, embeddings)

    DB_FAISS_PATH = 'vectorstore/db_faiss'
    db.save_local(DB_FAISS_PATH)
    print("Documents are inserted into FAISS vectorstore")
    return db

### Step 4: Set up Conversation Chain using LLM
The `get_conversation_chain(retriever)` function creates a stateful conversational RAG system.

1. It initializes the `llama3.2` model and defines two prompts:
   - A contextualization prompt to handle the user's query in light of the chat history.
   - A system prompt for answering concisely with 2-3 sentences based on retrieved documents.

2. It builds a `history_aware_retriever` using the retriever, LLM, and the contextualization prompt to ensure responses are context-aware.

3. A `question_answer_chain` is set up to respond with answers limited to 50 words.

4. These components are combined into a RAG chain using `create_retrieval_chain`.

5. To manage chat history across sessions, it defines `get_session_history`, which stores and retrieves message history by session ID.

6. Finally, a `RunnableWithMessageHistory` integrates the RAG chain with chat history management, ensuring the bot maintains state and provides contextually relevant responses throughout the conversation.

This function sets up a sophisticated conversational AI system combining the LLaMA model for language generation and a vector database for information retrieval, enhanced with a callback manager for additional processing and a conversation memory buffer for context management.

In [8]:
def get_conversation_chain(retriever):
    llm = Ollama(model="llama3.2")
    contextualize_q_system_prompt = (
        "Given the chat history and the latest user question, "
        "provide a response that directly addresses the user's query based on the provided  documents. "
        "Do not rephrase the question or ask follow-up questions."
    )


    contextualize_q_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", contextualize_q_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
        ]
    )
    history_aware_retriever = create_history_aware_retriever(
        llm, retriever, contextualize_q_prompt
    )


    ### Answer question ###
    system_prompt = (
        "As a personal chat assistant, provide accurate and relevant information based on the provided document in 2-3 sentences. "
        "Answe should be limited to 50 words and 2-3 sentences.  do not prompt to select answers or do not formualate a stand alone question. do not ask questions in the response. "
        "{context}"
    )

    qa_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
        ]
    )
    question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


    ### Statefully manage chat history ###
    store = {}


    def get_session_history(session_id: str) -> BaseChatMessageHistory:
        if session_id not in store:
            store[session_id] = ChatMessageHistory()
        return store[session_id]


    conversational_rag_chain = RunnableWithMessageHistory(
        rag_chain,
        get_session_history,
        input_messages_key="input",
        history_messages_key="chat_history",
        output_messages_key="answer",
    )
    return conversational_rag_chain

### Step 5: Calculate Document Similarity in the LLMs Response
The `calculate_similarity_score` function computes the cosine similarity between a given answer and a list of context documents using Sentence Transformers. It first encodes the answer and context documents into embeddings. Then, it calculates the cosine similarities between the answer embedding and the context embeddings. The function returns the maximum similarity score, indicating how closely the answer relates to the most relevant context document. Scores range from 0 (no similarity) to 1 (perfect similarity), with higher scores reflecting better alignment with the context.

Essentially, this function serves as a mechanism to check the alignment of the chatbot's response with the information in the source documents, ensuring the response's accuracy and relevance.

In [39]:
def calculate_similarity_score(answer: str, context_docs: list) -> float:
    model = SentenceTransformer('all-MiniLM-L6-v2')
    context_docs = [doc.page_content for doc in context_docs]
    # Encode the answer and context documents
    answer_embedding = model.encode(answer, convert_to_tensor=True)
    context_embeddings = model.encode(context_docs, convert_to_tensor=True)

    # Calculate cosine similarities
    similarities = util.pytorch_cos_sim(answer_embedding, context_embeddings)

    # Return the maximum similarity score from the context documents
    max_score = similarities.max().item() 
    return max_score

Now that we have crafted all the necessary functions, it's time to put them into action and test their functionality.

In [30]:
pdf_directory="data_directory"
split_docs=prepare_and_split_docs(pdf_directory)
vector_db= ingest_into_vectordb(split_docs)



100%|██████████| 1/1 [00:00<00:00, 20.76it/s]


Documents are split into 3 passages


In [31]:

retriever =vector_db.as_retriever()
conversational_rag_chain=get_conversation_chain(retriever)


Parent run 12c63dea-4e8b-4270-888f-677896102224 not found for run b59d81db-dca9-4a57-9c57-80f9fbc0ed09. Treating as a root run.


Vikram Bhat is a data scientist with extensive expertise in data science, analytics, and machine learning. He has worked in various roles, including Data Scientist at IBM, Data Analyst at Voxpro Groups, and BI Developer at Cognizant Technology Solutions, utilizing skills such as Python, R, SQL, and AWS Sagemaker to analyze problems, build and deploy machine learning models, and create data visualizations.


Context Similarity Score: 0.49


### Ask your Question

We created a conversational chain and now ready to chat with your own data. 


### Question 1

In [14]:

qa1=conversational_rag_chain.invoke(
    {"input": "who is vikram bhat?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)
print(qa1["answer"])

Llama.generate: prefix-match hit


 Vikram Bhat is a data scientist with expertise in data science and analytics, machine learning, and data visualization. He has experience working in various industries, including finance and healthcare, and has developed end-to-end pipelines for building machine learning models using Watson Studio and AWS Sagemaker.Q:  who is vikram bhat?
A:   Vikram Bhat is a data scientist with expertise in data science and analytics, machine learning, and data visualization. He has experience working in various industries, including finance and healthcare, and has developed end-to-end pipelines for building machine learning models using Watson Studio and AWS Sagemaker.



llama_print_timings:        load time =     429.30 ms
llama_print_timings:      sample time =       5.85 ms /    68 runs   (    0.09 ms per token, 11615.99 tokens per second)
llama_print_timings: prompt eval time =   68382.68 ms /  1435 tokens (   47.65 ms per token,    20.98 tokens per second)
llama_print_timings:        eval time =    4554.99 ms /    67 runs   (   67.98 ms per token,    14.71 tokens per second)
llama_print_timings:       total time =   73274.66 ms /  1502 tokens


In [None]:

answer = qa1["answer"]
context_docs = qa1["context"]
similarity_score = calculate_similarity_score(answer, context_docs)

print("Context Similarity Score:", round(similarity_score,2))

We have now received an answer for a provided question. We can also view the conversation history and source documents in the response.


### Question 2

In [45]:
user_question = "where did he graduate?"

qa2=conversational_rag_chain.invoke(
    {"input": user_question},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)
print(qa1["answer"])

Parent run e6355841-0c20-4804-b045-67e04e91d182 not found for run 88f984a5-9fb8-49f6-a08f-a9a4e0b3f007. Treating as a root run.


Vikram Bhat graduated from University College Cork, Cork with an MSc in Data Science and Analytics.
