## Ingesting PDF

In [None]:
%pip install --q unstructured langchain
%pip install --q "unstructured[all-docs]"

In [1]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [4]:
local_path = "book2.pdf"

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

In [3]:
# Preview first page
data[0].page_content



## Vector Embeddings

In [54]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling ce4a164fc046... 100% ▕████████████████▏   17 B                         
pulling 31df23ea7daa... 100% ▕████████████████▏  420 B                         
verifying sha256 digest ⠋ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A

In [55]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED      
llama3:latest          	71a106a91016	4.7 GB	19 hours ago 	
llava:latest           	8dd30f6b0cb1	4.7 GB	3 hours ago  	
nomic-embed-text:latest	0a109f422b47	274 MB	9 seconds ago	


In [26]:
%pip install --q chromadb
%pip install --q langchain-text-splitters

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [5]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [6]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [7]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████| 163/163 [03:14<00:00,  1.19s/it]


## Retrieval

In [8]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [9]:
# LLM from Ollama
local_model = "llama3"
llm = ChatOllama(model=local_model)

In [10]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [11]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [12]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [13]:
#Enter your question in the bracket and press enter
chain.invoke(input(""))

 whats this book about?


OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.05it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.68it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.98it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.25it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.87it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.73it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.67it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.05it/s]
OllamaEmbeddings: 100%|█████████████████

'Based on the provided snippets from "Book 2", it appears that this book is about user research methods and techniques for understanding people\'s thoughts, feelings, and behaviors. The book seems to focus on qualitative research methods, such as observation, interviews, focus groups, and object-based techniques like photo elicitation, collage, and card sorting.\n\nThe book also touches on the importance of reflection and debriefing after field visits and focus groups, and provides guidance on how to analyze and make sense of the data collected through these methods. Additionally, it mentions the value of generative techniques in helping researchers understand people\'s perspectives and values, as well as their relationships with products or services.\n\nOverall, "Book 2" appears to be a comprehensive guide for user research, covering topics such as observational studies, interview techniques, focus group moderation, and data analysis.'

In [14]:
chain.invoke("when should i do a diary study?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.03it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15.73it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.04it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15.86it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.18it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.76it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.47it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.96it/s]
OllamaEmbeddings: 100%|█████████████████

"A great question!\n\nYou should consider doing a diary study:\n\n1. **When you want to collect rich, qualitative data**: Diary studies are excellent for gathering detailed, descriptive information about people's experiences, thoughts, and behaviors.\n2. **To gain insights into long-term or complex behaviors**: Diary studies can help you understand how people's behaviors change over time or how they interact with a product or service in their daily lives.\n3. **When you need to collect data on a specific aspect of behavior**: Diary studies are well-suited for collecting data on specific behaviors, such as searching habits or social media usage.\n4. **To identify patterns and themes**: Diary studies can help you identify patterns and themes in people's experiences that might not be apparent through other research methods.\n\nIn general, diary studies are a good choice when you want to gather detailed, qualitative data about people's experiences over time."

In [15]:
chain.invoke("How do i do a nano usability study?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.31it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.91it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.40it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.97it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.58it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.94it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13.30it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.09it/s]
OllamaEmbeddings: 100%|█████████████████

'A "nano" usability study! That\'s a great question.\n\nIn essence, a nano usability study is an extremely condensed and informal version of a traditional usability test. It\'s designed to provide quick insights into how users interact with your product or application without the need for elaborate equipment, extensive resources, or lengthy testing periods.\n\nHere are some general guidelines to help you conduct a nano usability study:\n\n1. **Identify a specific goal**: Define what you want to learn from your nano usability study. What\'s the most important aspect of your product or application that you want users\' feedback on?\n2. **Choose a small, representative sample**: Select 3-5 participants who are familiar with your product or application and can provide feedback within a short time frame (e.g., 30 minutes to 1 hour).\n3. **Keep it simple**: Use a basic setup, such as asking participants to complete a simple task using your product or application on their own devices (e.g., p

In [16]:
# Delete all collections in the db
vector_db.delete_collection()