# OLMO RAG DEMO

Now it's your turn to apply your data and specific domain knowledge.

You can use this notebook as a starting point and adapt it to your needs.
You will need to develop the pre-processing stage for a RAG system.
This includes document retrieval, cleaning, chunking,
and ingestion into the vector database using an embedding model.

To help you, we've provided a few example code snippets in Jupyter notebooks found in the 
[`appendix`](../appendix/index.md).

In [1]:
from testcontainers.qdrant import QdrantContainer

In [2]:
qdrant = QdrantContainer()

In [3]:
qdrant.start()

Pulling image testcontainers/ryuk:0.8.1
Container started: ec5e308cc66e
Waiting for container <Container: ec5e308cc66e> with image testcontainers/ryuk:0.8.1 to be ready ...
Pulling image qdrant/qdrant:v1.8.3
Container started: dab23a1f4954
Waiting for container <Container: dab23a1f4954> with image qdrant/qdrant:v1.8.3 to be ready ...


<testcontainers.qdrant.QdrantContainer at 0x1164b7b50>

In [12]:
client = qdrant.get_client()

Waiting for container <Container: dab23a1f4954> with image qdrant/qdrant:v1.8.3 to be ready ...
Waiting for container <Container: dab23a1f4954> with image qdrant/qdrant:v1.8.3 to be ready ...


## Utility Functions

A section for whatever utility functions you need. We have packaged up our utility functions in a Python package called `ssec_tutorials`. You can find the source code in this [GitHub repository](https://github.com/uw-ssec/ssec_tutorials).

In [13]:
# Write your code here for whatever utility functions you need. This can be anything such as
# cleaning up document format, setting up prompt templates, etc.


# Uncomment the following for a simple document formatting function
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## Retrieve documents

A section for document retrieval. This just means getting your document from whatever sources,
in your local computer or the internet. See the [Document Loaders](https://python.langchain.com/v0.2/docs/integrations/document_loaders/) integration list from Langchain for an extensive list of what's possible.

For the purpose of this tutorial, we recommend a simple example of loading a piece of text from a file such as PDF. Also, if you have a large piece of text, you can split it into smaller chunks using Langchains's [RecursiveTextSplitter](https://python.langchain.com/v0.2/docs/how_to/recursive_text_splitter/).

If you don't have any data with you, you can try out with this [Algorithm Textbook by Jeff Erickson](http://jeffe.cs.illinois.edu/teaching/algorithms/book/Algorithms-JeffE.pdf). This textbook has been generously made available by Jeff Erickson under the [Creative Commons Attribution 4.0 International license](http://creativecommons.org/licenses/by/4.0/), you can find more information about the textbook at [https://jeffe.cs.illinois.edu/teaching/algorithms/](https://jeffe.cs.illinois.edu/teaching/algorithms/).

```{note}
If you're running things on Codespace, [refer to this link](https://stackoverflow.com/questions/62284623/how-can-i-upload-a-file-to-a-github-codespaces-environment) and upload your data to `resources/` folder. 
```

In [14]:
# Write your code here for your retrieval step,
# see the documentation on PyMuPDF for more information:
# https://python.langchain.com/v0.2/docs/how_to/document_loader_pdf/#using-pymupdf

# Uncomment below for code to download the textbook
import os
from urllib.request import urlretrieve

url = "http://jeffe.cs.illinois.edu/teaching/algorithms/book/Algorithms-JeffE.pdf"
filename = os.path.basename(url)

if not os.path.exists(filename):
    # Download if file doesn't exist
    pdf_path, headers = urlretrieve(url, filename)

In [19]:
import os
from langchain_community.document_loaders import PyMuPDFLoader

pdf_folder_path = "."  # update path to point to the relevant directory
documents = []
for file in os.listdir(pdf_folder_path):
    if file.endswith(".pdf"):
        pdf_path = os.path.join(pdf_folder_path, file)
        loader = PyMuPDFLoader(pdf_path)
        documents.extend(loader.load())

# for each in documents:
#     # print(each.page_content) # Uncomment this line to see the individual page_content
#     print(each.metadata)

In [None]:
# Write your code here to load the PDF document as a Langchain Document objects

## Document Embeddings to Qdrant Vector Database

Once you've figured out how to retrieve and load your documents to Langchain Document objects, you can then proceed to loading these documents to Qdrant Vector Database collection.

See the following documentation for some guidance on [Langchain Qdrant integration](https://python.langchain.com/v0.2/docs/integrations/vectorstores/qdrant/).

In [15]:
from langchain_huggingface import HuggingFaceEmbeddings

In [16]:
# Setup the embedding, we are using the MiniLM model here
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L12-v2")

  from tqdm.autonotebook import tqdm, trange


### Setup Vector DB

In [52]:
from qdrant_client import models
from langchain_qdrant import Qdrant

In [20]:
# Write your code here to load your data into the database

# uncomment below to set the Qdrant path and collection name
# for an "local mode" on-disk storage
# See https://python.langchain.com/v0.2/docs/integrations/vectorstores/qdrant/#on-disk-storage
# qdrant_path = "./my_qdrant_database"
qdrant_collection = "algorithms_book"

In [62]:
if not client.collection_exists(qdrant_collection):
    print("Creating collection:", qdrant_collection)
    client.create_collection(
        qdrant_collection,
        vectors_config=models.VectorParams(
            size=embedding.client.get_sentence_embedding_dimension(),
            distance=models.Distance.COSINE,
        ),
    )
    lcqdrant = Qdrant(
        client=client, collection_name=qdrant_collection, embeddings=embedding
    )
    uuids = lcqdrant.add_documents(documents=documents)
else:
    lcqdrant = Qdrant(
        client=client, collection_name=qdrant_collection, embeddings=embedding
    )

Creating collection: algorithms_book


### Test out the Qdrant collection

At this step, you should have a Qdrant object (`langchain_qdrant.vectorstores.Qdrant`) that has your document loaded into it in a collection. You can test out the collection by querying for a documents and checking if the results are as expected.

To do this, you'll need to create a [`VectorStoreRetriever`](https://python.langchain.com/v0.2/docs/how_to/vectorstore_retriever/).

```{note}
A sample question example to ask from the document can be `"What is the most familiar method for multiplying large numbers?"`.
An answer to this question can be found on page 3, section 0.2 Multiplication, Lattice Multiplication.
```

```{tip}
You'll probably need to tweak the arguments for creating a `VectorStoreRetriever` object for the best search type and limiting the number of documents. This part is a bit of trial and error, so don't be afraid to experiment. It is a critical part of RAG system to get the right documents for the question as that is what the LLM would use to generate the answer.
```

In [63]:
# Write your code here to try out the vector database retrieval with a question query
retriever = lcqdrant.as_retriever(search_type="mmr", search_kwargs={"k": 2})

In [64]:
retriever.invoke("What is the most familiar method for multiplying large numbers?")

[Document(metadata={'author': '', 'creationDate': "D:20190613170427-05'00'", 'creator': 'LaTeX with hyperref', 'file_path': './Algorithms-JeffE.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': "D:20190613170427-05'00'", 'page': 20, 'producer': 'pdfTeX-1.40.19', 'source': './Algorithms-JeffE.pdf', 'subject': '', 'title': '', 'total_pages': 472, 'trapped': '', '_id': 'ac1baae8-80c1-421e-b283-5e17150e1337', '_collection_name': 'algorithms_book'}, page_content='0.2. Multiplication\nto mechanical techniques for place-value arithmetic using “Arabic” numerals.\nPeople trained in the fast and reliable execution of these procedures were called\nalgorists or computators, or more simply, computers.\n0.2\nMultiplication\nAlthough they have been a topic of formal academic study for only a few decades,\nalgorithms have been with us since the dawn of civilization. Descriptions of\nstep-by-step arithmetic computation are among the earliest examples of written\nhuman language, long predating the e

## Setup OLMo Model

At this stage now we have the Retrieval-Augmented (RA) in RAG system. Let's now setup the Generation (G) part with the OLMo model.

In [65]:
from ssec_tutorials import download_olmo_model

# This will download the OLMO model to the cache directory
OLMO_MODEL = download_olmo_model()

In [66]:
# Uncomment this line to understand your available options for LlamaCpp Class
# LlamaCpp?

In [72]:
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import StreamingStdOutCallbackHandler

# Here we've setup the LlamaCpp model,
# but you'll need to add additional arguments to `LlamaCpp`
# to make it work for your specific use case
olmo = LlamaCpp(
    model_path=str(OLMO_MODEL),
    callbacks=[StreamingStdOutCallbackHandler()],
    verbose=False,
    n_ctx=2048,
)

```{tip}
Try asking some questions to OLMo about any content of the document you've loaded in the Qdrant collection.
You will find that the OLMo model is not trained on your specific domain, so it might not give you the best results.
```

In [68]:
_ = olmo.invoke(input="What is the most familiar method for multiplying large numbers?")

 The one most people still use, despite being taught otherwise.
The most familiar method for multiplying large numbers is the long hand method. In this method, you write out the multiplication problem as a story, using words to represent each digit and then writing the equal sign once all the words are in place. For example: 2 3 4 5 6 7 8 * 9 = ? 4800 The answer would be 4800.
The most familiar method for multiplying large numbers is still used today because it's simple, easy to understand, and requires little mental arithmetic. It also allows you to visualize the problem and make sure you have everything in the right order before attempting to write an equal sign.
What is the traditional way of writing down the product of two or more numbers?
The traditional way of writing down the product of two or more numbers is to use a comma-separated list of numbers, with each number followed by five dots (..) and then another set of numbers separated by commas. For example: 2, 3, 4, 5, 6 The co

## Prompt Engineering

Rather than a just a simple question, we'll need to refine the prompt to include instruction and context for the model to generate the answer. To do this, we'll need to setup the proper string [PromptTemplate](https://python.langchain.com/v0.2/docs/concepts/#string-prompttemplates).

In [69]:
from langchain_core.prompts import PromptTemplate

# Create the initial prompt template using OLMo's tokenizer chat template we saw in module 1.
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"],
    template_format="jinja2",
    partial_variables={"add_generation_prompt": True, "eos_token": "<|endoftext|>"},
)

Set the question for the prompt

In [70]:
question = "What is the most familiar method for multiplying large numbers?"

Set the context for the prompt.
This is where you'll need to use the `VectorStoreRetriever` and format the document object with `format_docs`
or simply add your own text to the variable.

In [None]:
# Uncomment variable below to set the context
# context = <Enter code or string here>

Set the instruction for the prompt.

In [None]:
instruction = """You are a computer science professor.
Please answer the following question based on the given context."""

The original OLMo chat template takes in multiple messages with a `role` and `content` key. You can use this template to ask questions to the model. For simplicity, we'll just use a single message.

In [None]:
# Uncomment below to set the input text template
# input_text_template = f"""\
# {instruction}

# Context: {context}

# Question: {question}
# """

In [None]:
# Uncomment below to set the message dictionary
# message = {
#     "role": "user",
#     "content": input_text_template,
# }

In [None]:
# Uncomment below to try out the prompt template
# print(prompt_template.format(
#     messages=[message]
# ))

You can see above what the final prompt looks like. There are tags like `<|user|>` that signify the model that this is a user input and so on. This final string is sent to the model for generating the answer.

## RAG

At this point you have all the parts for RAG system setup. Now let's chain the prompt engineering, OLMo model and the Qdrant collection to get a more accurate answer.

In [73]:
# 1. Set the question
question = "What is the most familiar method for multiplying large numbers?"

# 2. Set the context
context = format_docs(retriever.invoke(question))

# 3. Set the instruction
instruction = """You are a computer science professor.
Please answer the following question based on the given context."""

# 4. Set the input text template
input_text_template = f"""\
{instruction}

Context: {context}

Question: {question}
"""

# 5. Set the message dictionary
message = {
    "role": "user",
    "content": input_text_template,
}

# 6. Chain the prompt template and olmo model
llm_chain = prompt_template | olmo

# 7. Invoke the chain
llm_chain.invoke(input={"messages": [message]})

The most familiar method for multiplying large numbers is the lattice algorithm. This algorithm was popularized by Fibonacci in "Liber Abaci" and is still widely used today, especially in American education. This method reduces multiplication to addition and subtraction, making it more accessible and efficient for larger numbers. The lattice algorithm has its roots in ancient civilizations such as China, India, and Egypt, where they used the technique of lattice division or "lattice method," based on place-value notation.

'The most familiar method for multiplying large numbers is the lattice algorithm. This algorithm was popularized by Fibonacci in "Liber Abaci" and is still widely used today, especially in American education. This method reduces multiplication to addition and subtraction, making it more accessible and efficient for larger numbers. The lattice algorithm has its roots in ancient civilizations such as China, India, and Egypt, where they used the technique of lattice division or "lattice method," based on place-value notation.'

In [None]:
# Write your code here to create the retrieval chain

```{admonition} Answer Example Code
:class: hint dropdown

```{code-block} python
# 1. Set the question
question = "What is the most familiar method for multiplying large numbers?"

# 2. Set the context
context = format_docs(retriever.invoke(question))

# 3. Set the instruction
instruction = """You are a computer science professor.
Please answer the following question based on the given context."""

# 4. Set the input text template
input_text_template = f"""\
{instruction}

Context: {context}

Question: {question}
"""

# 5. Set the message dictionary
message = {
    "role": "user",
    "content": input_text_template,
}

# 6. Chain the prompt template and olmo model
llm_chain = prompt_template | olmo

# 7. Invoke the chain
llm_chain.invoke(input={"messages": [message]})
```
```

In [74]:
qdrant.stop()

**Bonus: Try to create a simple chat app, by modifying the [1-olmo-chat-rag.ipynb](./1-olmo-chat-rag.ipynb) notebook with your use case.**

Please fill out the [survey feedback form](https://tinyurl.com/ssecfeedback) to help us improve the tutorial.