# Build a RAG Chatbot from scratch

Retrieval-Augmented Generation (RAG) is a method that combines a vector database and generative models for tasks such as open-domain question answering, dialogue systems, and other text generation tasks. The retrieval step consists of finding relevant information from a large corpus, while generative models create new responses based on the input. RAG enhances the performance by first retrieving relevant documents and then generating responses based on the retrieved information.

👩‍🍳 Here are the ingredients of a RAG Pipeline:

1. Unstructured data (the movie reviews will suffice).
2. A pre-trained sentence encoder.
3. A vector database and an appropriate similarity metric. Store the encoded vectors in a vector database. Choose an appropriate similarity metric.
4. A large language model.
5. A system prompt that instructs an LLM to use the context to answer a query.

Finally, you should set up a collection and process the data into records:

1. Clean and preprocess the data (e.g. by removing irrelevant information, special characters, and applying tokenization, stemming, or lemmatization).
2. Use the sentence encoder model to convert text documents into fixed-length dense vectors.

We covered these steps in the previous sections. Now we can actually code our own movie assistant! The required steps will be the following:

1. For a given input query, encode the query using the same sentence encoder model used for encoding the documents.
2. Perform a similarity search in the vector database to retrieve the top-k most relevant documents based on their vector representation.
3. Format the retrieved documents and the query to make a single prompt.
4. Send the prompt to the generative model.
5. Package this into a nice function, or user interface.

# Prerequisite: Get the OpenAI Key

In [None]:
import os
import getpass


def get_pass():
    return getpass.getpass("Enter your OpenAI API Key: ")


os.environ["OPENAI_API_KEY"] = get_pass()

# Preparation 1: Load Data

Load the data using the built-in JSON module.

In [None]:
import json

with open("../data/movie_data.json") as f:
    movies = json.load(f)

# Preparation 2: Create the Encoder Model

Create an `encoder` with `all-MiniLM-L6-v2`. Use the `sentence-transformers` library.

In [None]:
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2")

# Preparation 3: Create the Qdrant Collection

Create a `qdrant` collection using the dimensions of the encoder model and cosine similarity as the distance metric.

In [None]:
from qdrant_client import models, QdrantClient

qdrant = QdrantClient(":memory:")

COLLECTION_NAME = "movies"

qdrant.recreate_collection(
    collection_name=COLLECTION_NAME,
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), distance=models.Distance.COSINE
    ),
)

# Preparation 4: Populate the collection

Encode the first 300 movies in the dataset and upload them in the Qdrant collection.

In [None]:
import tqdm

RECORDS_RANGE = slice(0, 300)

records = [
    models.Record(id=idx, vector=encoder.encode(mov["overview"]).tolist(), payload=mov)
    for idx, mov in tqdm.tqdm(enumerate(movies[RECORDS_RANGE]))
]

qdrant.upload_points(collection_name=COLLECTION_NAME, points=records)

# Preparation 5: Write a function to retrieve results

Write a function that accepts a query and a number of items to retrieve, and returns that number of items from the vector database.

In [None]:
def get_records(
    query, *, encoder=encoder, client=qdrant, collection=COLLECTION_NAME, max_results=10
):
    query_vector = encoder.encode(query).tolist()
    return client.search(
        collection_name=collection, query_vector=query_vector, limit=max_results
    )


question = "What should I see tonight? I love Sci-Fi movies but I have seen most of the classics, such as Star Wars."

docs = get_records(question, max_results=5)
results = [doc.payload for doc in docs]

Now we are finally ready to brew our movie assistant! Beware, the exercises will require a bit more coding skill.

## Exercise 1: Prompt Engineering

Most LLMs can be provided with a `system`, i.e. a string that codifies its behavior. This string is prepended to every prompt and is used to give a tone to the model. For example:

```python
system_1 = """You are a helpful assistant."""

system_2 = """You are a distracted poet who always answer in rhyme."""
```

Your first task is to craft a system prompt that will give a unique personality to your movie buddy.

In [None]:
GEEK_SYSTEM = """
  You are a DVD record store assistant and your goal is to recommed the user with a good movie to watch.

  You are a movie expert and a real geek: you love sci-fi movies and tend to get excited when you talk about them.
  Nevertheless, no matter what, you always want to make your customers happy.
"""

## Intermezzo: Python String Fomatting

Before going further, however, we need to review a bit of string formatting. If you already know what that is and how to use the `.format` method, you can move on. For all the others, it's time for an intermezzo.

Here is a refresher. There are multiple ways of formatting a string in Python. The most used is this:

```python
name = "Rob"
role = "Movie Buddy"

print(f"Hi! I am {name} and I am a great {role}!")
```

However, there is another, equivalent way: the `.format` method. It works like this:

```python
print("Hi! I am {name} and I am a great {role}!".format(name="Rob", role="Movie Buddy"))
```

Note: no more `f` at the beginning of the string! This notation is more verbose and achieves the same result. Generally speaking, you should always prefer the first method. However, this kind of formatting allows you to build a "lazy" string. Try running the following code:

```python
message = "Hi! I am {name} and I am a great {role}!"
print(message)
```

No string substitution was performed. We can use this at our advantage. Try running the following code:

```python
print(message.format(name="Rob", role="Movie Buddy"))
```

Do you see where we are headed? We can use this `.format` method to "populate" the prompt template with the bits we obtain from the user query and the vector database.

Now we are finally ready to move on!

## Exercise 2: Prompt Templates

You will need to create two *templates* to build a skeleton out of the records retrieved from the vector database and the user query.

1. A *prompt template* that will instruct the LLM to use the context to answer the query. This looks a bit like the system prompt. It might look a bit like this:

```
"""Given this context: {context}

Answer this query: {prompt}
"""
```

2. A *context template* to format the items retrieved from the vector DB into a textual form. A bit like this:

```
"""
Movie 1 - some details - some more details
Movie 2 - some details - some more details
...
"""
```

Inspect the results of your qdrant query: what information would you like to retain in your context?

Beware: this is a bit trickier, because you have to parse every result into a string and then merge every string into a single one. You might want to write this into a function, so that you can use it later more easily.

In [None]:
prompt_template = """
  Here are some suggested movies (ranked by relevance) to help you with your choice.
  {context}

  Use these suggestions to answer this question:
  {question}
"""

context_template = """
Title: {title}
Overview: {overview}
Release date: {release_date}
Runtime: {runtime}
"""


def format_records_into_context(records, *, template):
    return "".join(
        context_template.format(
            title=rec["title"],
            overview=rec["overview"],
            release_date=rec["release_date"],
            runtime=rec["runtime"],
        )
        for rec in results
    )

# Exercise 3: Use the AI!

It's finally time to unleash the artificial intelligence. Create an instance of the OpenAI client. Write a function that follows these guides:

1. Takes a `question` and a `max_results` parameters. The first is the movie query, the second is the number of items to retrieve from the vector database.
2. Takes a `system`, a `prompt_template` and a `context_template`.
3. Calls the `get_records` function to get the `max_results` most similar records in the vector DB.
4. Format the records according using the `context_template`.
5. Create a prompt using the `prompt_template`.
6. Query GPT-3.5 using the resulting template!

In [None]:
import openai

client = openai.OpenAI()


def format_context(records):
    return


def ask(
    question,
    *,
    max_results=10,
    system=GEEK_SYSTEM,
    prompt_template=prompt_template,
    context_template=context_template,
    qdrant=qdrant,
    collection=COLLECTION_NAME,
):
    records = get_records(
        query=question, max_results=max_results, client=qdrant, collection=collection
    )
    context = format_records_into_context(records, template=context_template)

    prompt = prompt_template.format(question=question, context=context)

    chat_completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": prompt},
        ],
    )

    answer = chat_completion

    print(answer.choices[0].message.content)

    return answer


answer = ask(question=question)

# Challenge 1: Use different embeddings models!

# Challenge 2: to Production!

It's time to get Object Oriented! Refactor the code above into a `VectorDBService` class. It should satisfy the following requirements:

1. In the `__init__` method, the function accepts a collection name, an encoder model and a distance metric. In the initialisation, a collection is (re)created using the collection name and distance metric; model weights are also donwloaded.
2. A `process` method that accepts a dataset and ingests it into the collection.
3. A `get_top_k` method that returns the top K elements by similarity given a `query` vector.

Then, write a `MovieBuddy` class:

1. The `__init__` function takes a `system`, a `prompt_template` and a `context_template`, plus a `VectorDBService`.
2. Has a `apply_template` method that, given a query and a series of records, formats them using the `context_template` and the `prompt_template`, and then returns a `query`.
3. It has a `ask` method that takes in a `question` parameter and a `client` object of type `OpenAI`. The implementation uses `VectorDBService.get_top_k` and `apply_template` to ask the client and generate a completion.

# Challenge 3: Interactive Movie Buddy

Use `ipywidgets` to create interactive elements to perform your query.

# Appendix 1: A more modern pipeline with `llama-index`