### Ollama PDF RAG Notebook V2

#### Import Libraries

In [59]:
# Imports
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Jupyter-specific imports
from IPython.display import display, Markdown

# Set environment variable for protobuf
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

#### Load PDF

In [60]:
# Load PDF
local_path = "../data/Think_and_Grow_Rich.pdf"
if local_path:
    loader = UnstructuredPDFLoader(file_path=local_path)
    data = loader.load()
    print(f"PDF loaded successfully: {local_path}")
else:
    print("Upload a PDF file")

PDF loaded successfully: ../data/Think_and_Grow_Rich.pdf


#### Split text into chunks

In [61]:
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(data)
print(f"Text split into {len(chunks)} chunks")

Text split into 709 chunks


#### Creating Vector DB

In [62]:
# Create vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text",base_url="http://localhost:11434"),
    collection_name="local-rag"
)
print("Vector database created successfully")

Vector database created successfully


#### Set up LLM and Retrieval

In [63]:
# Set up LLM and retrieval
local_model = "mistral-small:24b"
llm = ChatOllama(model=local_model)

In [64]:
# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate 2
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

# Set up retriever
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

#### Create Chain

In [65]:
# RAG prompt template
template = """Answer the question based ONLY on the following context and question. Do not use any external information. Give citations. if question is not relevant to the context, answer with "Not relevant".:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [66]:
# Create chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

#### Chat with PDF

In [67]:
def chat_with_pdf(question):
    """
    Chat with the PDF using the RAG chain.
    """
    return display(Markdown(chain.invoke(question)))

In [68]:
# Example 1
chat_with_pdf("What is the main idea of this document?")

This document appears to be an excerpt from a self-help or personal development book, likely "Think and Grow Rich" by Napoleon Hill. The main idea is that individuals have the power to shape their own destiny and achieve success through the application of certain principles, such as setting clear goals, cultivating a positive mindset, and overcoming fear and negativity.

In [69]:
chat_with_pdf("Why you are the master of your fate?")

You are the master of your fate because you have the ability to control your thoughts and with this control, you can open or close your mind to different thought impulses. This means that you have the power to choose what influences your life and shape your destiny.

In [70]:
chat_with_pdf("Who is the author of this book?")

The text doesn't explicitly mention the author's name. However, based on the content and style, it appears to be an excerpt from the book "Think and Grow Rich" by Napoleon Hill.

In [71]:
chat_with_pdf("Who went to the moon?")

None of the documents mention anyone going to the moon.

In [72]:
chat_with_pdf("how to become rich?")

According to the text, becoming rich starts with desire. To accumulate wealth, one should:

1. Have a strong desire for money
2. Be willing to put in effort and persistence
3. Take inventory of oneself and identify areas for improvement
4. Create a definite plan for carrying out one's desires
5. Write out a clear statement of the amount of money intended to be acquired, the time limit, what is being given in return, and the plan for accumulation
6. Read the written statement aloud twice daily, with conviction and feeling

Additionally, it is emphasized that riches begin in the form of thought, and faith removes limitations. The text also suggests that one should study and answer questions truthfully to gain knowledge about oneself and identify areas for improvement.

It's worth noting that the text does not provide a get-rich-quick scheme or a magic formula, but rather emphasizes the importance of mindset, planning, and hard work in achieving financial success.

#### Cleaning

In [73]:
# Optional: Clean up when done 
vector_db.delete_collection()
print("Vector database deleted successfully")

Vector database deleted successfully
