### Ollama PDF RAG Notebook V2

#### Import Libraries

In [22]:
# Imports
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Jupyter-specific imports
from IPython.display import display, Markdown

# Set environment variable for protobuf
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

#### Load PDF

In [23]:
# Load PDF
local_path = "../data/Think_and_Grow_Rich.pdf"
if local_path:
    loader = UnstructuredPDFLoader(file_path=local_path)
    data = loader.load()
    print(f"PDF loaded successfully: {local_path}")
else:
    print("Upload a PDF file")

PDF loaded successfully: ../data/Think_and_Grow_Rich.pdf


#### Split text into chunks

In [24]:
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(data)
print(f"Text split into {len(chunks)} chunks")

Text split into 709 chunks


#### Creating Vector DB

In [25]:
# Create vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text",base_url="http://localhost:11434"),
    collection_name="local-rag"
)
print("Vector database created successfully")

Vector database created successfully


#### Set up LLM and Retrieval

In [47]:
# Set up LLM and retrieval
local_model = "llama3.3:latest"
llm = ChatOllama(model=local_model,temperature=0.1)

In [48]:
# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate 2
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

# Set up retriever
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

#### Create Chain

In [49]:
# RAG prompt template
template = """Answer the question based ONLY on the following context and question. Do not use any external information. Give citations. if question is not relevant to the context, answer with "Not relevant".:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [50]:
# Create chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

#### Chat with PDF

In [58]:
def chat_with_pdf(question):
    """
    Chat with the PDF using the RAG chain.
    """
    return display(Markdown(chain.invoke(input = question + "give citations in the format document name : <document name> ; page number : <page number> ; paragraph number : <paragraph number> ; first sentence : <first sentence>")))

In [59]:
chat_with_pdf("whats the purpose of this book")

The purpose of this book is to provide a description of thirteen principles that can help achieve success, and to instruct the reader on how to transmute the definite purpose of desire for money into its monetary equivalent.

Here are the citations:
Document name : Think_and_Grow_Rich.pdf ; page number : 1 ; paragraph number : 1 ; first sentence : "In answer to these questions, this book was written."
Additionally, 
Document name : Think_and_Grow_Rich.pdf ; page number : not specified ; paragraph number : 3 ; first sentence : "This book has been confined, exclusively, to instructing the reader how to transmute the DEFINITE PURPOSE OF DESIRE FOR MONEY, into its monetary equivalent." 

Note: The page numbers are not explicitly mentioned in the provided context for all documents.

In [60]:
# Example 1
chat_with_pdf("What is the main idea of this document?")

The main idea of this document is to provide instructions and principles for achieving success, specifically in accumulating wealth, by developing a positive mindset, following certain steps, and using one's creative faculties.

Here are the citations:
Think_and_Grow_Rich.pdf : Think_and_Grow_Rich.pdf ; page number : Not specified ; paragraph number : 1 ; first sentence : "The fact that you are reading this book is an indication that you earnestly seek knowledge."
Additionally, 
Think_and_Grow_Rich.pdf : Think_and_Grow_Rich.pdf ; page number : Not specified ; paragraph number : 3 ; first sentence : "One sound idea is all that one needs to achieve success."
Also,
Think_and_Grow_Rich.pdf : Think_and_Grow_Rich.pdf ; page number : 57 ; paragraph number : 1 ; first sentence : "The creative faculty becomes more alert, more receptive to vibrations from the sources mentioned, in proportion to its development through USE."

In [61]:
chat_with_pdf("Why you are the master of your fate?")

You are the master of your own earthly destiny because you have the power to control your own thoughts. 
Document name : Think_and_Grow_Rich.pdf ; page number : 153 ; paragraph number : 1 ; first sentence : You may control your own mind, you have the power to feed it whatever thought impulses you choose. 

This is stated in the provided context as: "You are the master of your own earthly destiny just as surely as you have the power to control your own thoughts." (Think_and_Grow_Rich.pdf, page 153)

In [62]:
chat_with_pdf("Who is the author of this book?")

Not relevant 

(Note: The context does not explicitly mention the author's name. Although it mentions "the man to whom Carnegie disclosed the astounding secret of his riches-- the same man to whom the 500 wealthy men revealed the source of their riches" and "the author's introduction", it does not provide a specific name.)

In [63]:
chat_with_pdf("Who went to the moon?")

Not relevant. 

The context does not mention anyone going to the moon. The provided text appears to be excerpts from the book "Think and Grow Rich" by Napoleon Hill, discussing topics such as desire, imagination, and success, but it does not reference space travel or the moon.

In [64]:
chat_with_pdf("how to become rich?")

To become rich, one can follow the principles outlined in the book "Think and Grow Rich" by Napoleon Hill. Here are some steps to become rich, along with citations:

1. **Intelligent planning**: Intelligent planning is essential for success in any undertaking designed to accumulate riches. (Document name: Think_and_Grow_Rich ; Page number: Not specified; Paragraph number: Not specified; First sentence: Intelligent planning is essential for success in any undertaking designed to accumulate riches.)
2. **Develop a leader mindset**: Decide at the outset whether you intend to become a leader in your chosen calling, or remain a follower. (Document name: Think_and_Grow_Rich ; Page number: Not specified; Paragraph number: Not specified; First sentence: Broadly speaking, there are two types of people in the world.)
3. **Think and grow rich**: RICHES begin in the form of THOUGHT! (Document name: Think_and_Grow_Rich ; Page number: 43; Paragraph number: Not specified; First sentence: RICHES begin in the form of THOUGHT!)
4. **Remove limitations**: FAITH removes limitations! (Document name: Think_and_Grow_Rich ; Page number: 43; Paragraph number: Not specified; First sentence: FAITH removes limitations!)
5. **Avoid fatal mistakes**: QUICK RICHES are more dangerous than poverty, and INTENTIONAL DISHONESTY is fatal to success. (Document name: Think_and_Grow_Rich ; Page number: 80; Paragraph number: Not specified; First sentence: 26.)
6. **Acquire facts and think accurately**: Most people are too indifferent or lazy to acquire FACTS with which to THINK ACCURATELY. (Document name: Think_and_Grow_Rich ; Page number: 80; Paragraph number: Not specified; First sentence: 29.)

Note that the page numbers and paragraph numbers are not always specified in the provided text, so I couldn't include them in all citations.

Additionally, here are some general principles from the book:

* The Thirteen Steps to Riches described in this book offer the shortest dependable philosophy of individual achievement ever presented for the benefit of the man or woman who is searching for a definite goal in life. (Document name: Think_and_Grow_Rich ; Page number: 2; Paragraph number: Not specified; First sentence: The Thirteen Steps to Riches described in this book offer the shortest dependable philosophy of individual achievement ever presented for the benefit of the man or woman who is searching for a definite goal in life.)
* The amount of riches is limited only by the person in whose mind the THOUGHT is put into motion. (Document name: Think_and_Grow_Rich ; Page number: 43; Paragraph number: Not specified; First sentence: RICHES begin in the form of THOUGHT!)

#### Cleaning

In [17]:
# Optional: Clean up when done 
vector_db.delete_collection()
print("Vector database deleted successfully")

Vector database deleted successfully
