# Chapter 1

**Set up a basic RAG pipeline (BM25/TFIDF + simple QA model)**

In this chapter, we will learn about building a simple RAG pipeline. We will mainly focus on how to preprocess and chunk the data followed by building a simple retrieval engine without using any fancy "Vector Index". The idea is to show the inner working of a retrieval pipeline and make you understand the workflow from a user query to a generated response using an LLM.

For this chapter you will need a Cohere API key.

Create an `.env` file in the `notebooks` directory. The file's content is:

```
CO_API_KEY="<your-cohere-api-key>"
```
`.env` combined with `dotenv` allows for better API keys management in notebooks.

In [1]:
import json
import os
import pathlib
from datetime import datetime
from typing import Dict, List

import dotenv
import numpy as np
import wandb
import cohere
from scipy.spatial.distance import cdist
from sklearn.feature_extraction.text import TfidfVectorizer


dotenv.load_dotenv()

True

Here, we will start a Weights and Biases (W&B) run.

Throughout this notebook, W&B will be used to store, version, and download text files, creating a lineage. We will begin with an already uploaded raw data file as a [W&B Artifact](https://docs.wandb.ai/guides/artifacts), download it, inspect it, and devise a chunking strategy. Finally, we will upload the processed documents back to W&B as an Artifact, establishing a lineage (DAG) between the raw and processed data.

In [2]:
WANDB_ENTITY = "rag-course"
WANDB_PROJECT = "dev"

wandb.require("core")

run = wandb.init(
    entity=WANDB_ENTITY,
    project=WANDB_PROJECT,
    group="Chapter 1",
)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mparambharat[0m ([33mrag-course[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [3]:
# TODO: Remove this once we more to the final project
# documents_artifact = wandb.Artifact(
#     name="wandb_docs",
#     type="dataset",
#     description="W&B Documentation in Markdown format",
#     metadata={
#         "total_files": 380,
#         "date_processed": datetime.now().strftime("%Y-%m-%d"),
#     },
# )

# documents_artifact.add_dir("../data/wandb_docs")
# run.log_artifact(documents_artifact)

## Data ingestion

### Loading the data

Use [W&B Artifacts](https://docs.wandb.ai/guides/artifacts) to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. W&B Artifact is a powerful object storage with rich UI functionalities.

Below we are downloading an artifact named `wandb_docs` which will download 380 markdown files in your `../data/wandb_docs` directory. This is our data source.

In [4]:
documents_artifact = run.use_artifact(
    f"{WANDB_ENTITY}/{WANDB_PROJECT}/wandb_docs:latest", type="dataset"
)
data_dir = "../data/wandb_docs"

docs_dir = documents_artifact.download(data_dir)

2024/07/08 12:36:30 [DEBUG] GET https://storage.googleapis.com/wandb-production.appspot.com/rag-course/dev/0z2t11h3/artifact/936064166/wandb_manifest.json?Expires=1720425990&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=GOjDRxHCR6SW%2FDPTIBZwZiH%2Frwi%2FXwsWw4YgrIbYprSNj2Cc%2FImfPK0kbkGgm6XVQMg96G02gF3p%2FDykJfu8%2FfV3rJARpwem9uxOufltVI%2B2wP2V3vkBf8QLz%2BSAWYSLK%2Ftwl4KHoVirr4iIrgXJT7bLXfy%2FMUwRJCvmDGG8YQlHoTmwpmuOC1ClwFJo8a1MV9Qd6MUz5CYOpW1SrVr8G%2FjVEbwhMFhCFYFQ%2B0h98NRsDty3Sae16%2FrgWvixiMA8kP84tEtYdrwUrAzy%2B8cGR0ddF36Urgn1OPtvseGswi8%2F0xivECf1z54ZhfJfwxYFZTWwH9ad8EETSZSghT0Y5w%3D%3D


Upon inspecting the `../data/wandb_docs` directory below, we see that we have downloaded 380 files. The first 5 files are all in markdown (`.md` file format).

In [5]:
docs_dir = pathlib.Path(docs_dir)
docs_files = sorted(docs_dir.rglob("*.md"))

print(f"Number of files: {len(docs_files)}\n")
print("First 5 files:\n{files}".format(files="\n".join(map(str, docs_files[:5]))))

Number of files: 380

First 5 files:
../data/wandb_docs/guides/app/features/anon.md
../data/wandb_docs/guides/app/features/custom-charts/intro.md
../data/wandb_docs/guides/app/features/custom-charts/walkthrough.md
../data/wandb_docs/guides/app/features/intro.md
../data/wandb_docs/guides/app/features/notes.md


Lets look at an example file below. We take the first element of the list (`docs_files`) and use a convenient `Path.read_text` method which returns the decoded contents of the pointed-to file as a string.

üí° Looking at the example, we see some format to it. While building an ingestion pipeline, it is a good practice to look through few examples to see if there is any pattern to your data source. It helps to come up with better preprocessing steps and chunking strategies.

In [6]:
print(docs_files[0].read_text())

---
description: Log and visualize data without a W&B account
displayed_sidebar: default
---

# Anonymous Mode

Are you publishing code that you want anyone to be able to run easily? Use Anonymous Mode to let someone run your code, see a W&B dashboard, and visualize results without needing to create a W&B account first.

Allow results to be logged in Anonymous Mode with `wandb.init(`**`anonymous="allow"`**`)`

:::info
**Publishing a paper?** Please [cite W&B](https://docs.wandb.ai/company/academics#bibtex-citation), and if you have questions about how to make your code accessible while using W&B, reach out to us at support@wandb.com.
:::

### How does someone without an account see results?

If someone runs your script and you have to set `anonymous="allow"`:

1. **Auto-create temporary account:** W&B checks for an account that's already signed in. If there's no account, we automatically create a new anonymous account and save that API key for the session.
2. **Log results quickly:** T

Below, we are storing files as dictionaries with content (raw text) and metadata. Metadata is extra information for that data point which can be used to group together similar data points, or filter out a few data points. We will see in future chapters the importance of metadata and why it should not be ignored while building the ingestion pipeline.

The metadata can be derived (`raw_tokens`) or is inherent (`source`) to the data point.

Note that we are simply doing word counting and calling it `raw_tokens`. In practice we would be using the [tiktoken tokenizer](https://github.com/openai/tiktoken) to calculate the token counts but this naive calculation is an okay approximation for now.

In [7]:
# We'll store the files as dictionaries with some content and metadata
data = []
for file in docs_files:
    content = file.read_text()
    data.append(
        {
            "content": content,
            "metadata": {
                "source": str(file.relative_to(docs_dir)),
                "raw_tokens": len(content.split()),
            },
        }
    )
data[:2]

  'metadata': {'source': 'guides/app/features/anon.md', 'raw_tokens': 470}},
 {'content': '---\nslug: /guides/app/features/custom-charts\ndisplayed_sidebar: default\n---\n\nimport Tabs from \'@theme/Tabs\';\nimport TabItem from \'@theme/TabItem\';\n\n# Custom Charts\n\nUse **Custom Charts** to create charts that aren\'t possible right now in the default UI. Log arbitrary tables of data and visualize them exactly how you want. Control details of fonts, colors, and tooltips with the power of [Vega](https://vega.github.io/vega/).\n\n* **What\'s possible**: Read the[ launch announcement ‚Üí](https://wandb.ai/wandb/posts/reports/Announcing-the-W-B-Machine-Learning-Visualization-IDE--VmlldzoyNjk3Nzg)\n* **Code**: Try a live example in a[ hosted notebook ‚Üí](https://tiny.cc/custom-charts)\n* **Video**: Watch a quick [walkthrough video ‚Üí](https://www.youtube.com/watch?v=3-N9OV6bkSM)\n* **Example**: Quick Keras and Sklearn [demo notebook ‚Üí](https://colab.research.google.com/drive/1g-gNGokP

Checking the total number of tokens of your data source is a good practice. In this case, the total tokens is more than 200k. Surely, most LLM providers cannot process these many tokens. Building a RAG is justified in such cases.

In [8]:
total_tokens = sum(map(lambda x: x["metadata"]["raw_tokens"], data))
print(f"Total Tokens in dataset: {total_tokens}")

Total Tokens in dataset: 246998


The newly created list of dictionaries with `content` and `metadata` will now be logged as a W&B Artifact called `raw_data`. We started with the `wandb_docs` artifact and now we are are logging the processed data as `raw_data` artifact.

In [9]:
# Let's store the raw data in an artifact for future use and reproducibility
raw_artifact = wandb.Artifact(
    name="raw_data",
    type="dataset",
    description="Raw wandb documentation",
    metadata={
        "total_files": len(data),
        "date_processed": datetime.now().strftime("%Y-%m-%d"),
        "total_raw_tokens": total_tokens,
    },
)
with raw_artifact.new_file("documents.jsonl", mode="w") as f:
    for item in data:
        f.write(json.dumps(item) + "\n")
run.log_artifact(raw_artifact)

<Artifact raw_data>

### Chunking the data

Each document contains a large number of tokens, so we need to split it into smaller chunks to manage the token count per chunk. This approach serves three main purposes:

* Most embedding models have a limit of 512 tokens per input (based on the tokenizer used during their training).

* Chunking allows us to retrieve and send only the most relevant portions to our LLM, significantly reducing the total token count. This helps keep the LLM‚Äôs cost and processing time manageable.

* When the text is small-sized, embedding models tend to generate better vectors as they can capture more fine-grained details and nuances in the text, resulting in more accurate representations.

Below we are chunking each content (text) to a maximum length of 500 tokens (`CHUNK_SIZE`). We are not overlapping (`CHUNK_OVERLAP`) the content of one chunk with another chunk.


In [10]:
# These are hyperparameters of our ingestion pipeline

CHUNK_SIZE = 300
CHUNK_OVERLAP = 0


def split_into_chunks(
    text: str, chunk_size: int = CHUNK_SIZE, chunk_overlap: int = CHUNK_OVERLAP
) -> List[str]:
    """Function to split the text into chunks of a maximum number of tokens
    ensure that the chunks are of size CHUNK_SIZE and overlap by chunk_overlap tokens
    use the `tokenizer.encode` method to tokenize the text
    """
    tokens = text.split()
    chunks = []
    start = 0
    while start < len(tokens):
        end = start + chunk_size
        chunk = tokens[start:end]
        chunks.append(" ".join(chunk))
        start = end - chunk_overlap
    return chunks

We will use the `raw_data` artifact we pushed as Artifacts earlier. Let's download it. By using `use_artifact` method, we are starting a lineage from the `raw_data` artifact to the new artifact we will create further below.

In [11]:
raw_artifact = run.use_artifact(
    f"{WANDB_ENTITY}/{WANDB_PROJECT}/raw_data:latest", type="dataset"
)
artifact_dir = raw_artifact.download()
raw_data_file = pathlib.Path(f"{artifact_dir}/documents.jsonl")
raw_data = list(map(json.loads, raw_data_file.read_text().splitlines()))
raw_data[:2]

2024/07/08 12:36:35 [DEBUG] GET https://storage.googleapis.com/wandb-production.appspot.com/rag-course/dev/0z2t11h3/artifact/936065098/wandb_manifest.json?Expires=1720425995&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=ilzIBCFPR5buvy4PTWk8PnFcJ2%2BDIR08OLxXl4BTU1HZ%2BkDJfUKg%2FCxW7biIVYJch0UTFe65xA8ktm%2FMFsEKuQ3ai6iG0WTbr4ZDOyhAVB5QO%2FFbwtofzhzwQ2v8fUO7Ox9cJUPTxl7dONaS7xoJzfHjfNNsvCY8c1dRl1xf7z95ZRj3WYazkYvfskrIX9%2B9YshfuYLJlKfbIGRqujQ2zUFQaDUpBiau2pbcobe%2BNkNziSpxK2P3hduf7omfJOHdhm6cZU1QCwjOdj%2F5W%2FWFew0YDwOCjCRNHZk%2B1eJbvs8b2IIq0LMcGhO8d%2FcWnuF%2FSVA5%2B%2BRnQKVLUzPU%2B9Z8rA%3D%3D


  'metadata': {'source': 'guides/app/features/anon.md', 'raw_tokens': 470}},
 {'content': '---\nslug: /guides/app/features/custom-charts\ndisplayed_sidebar: default\n---\n\nimport Tabs from \'@theme/Tabs\';\nimport TabItem from \'@theme/TabItem\';\n\n# Custom Charts\n\nUse **Custom Charts** to create charts that aren\'t possible right now in the default UI. Log arbitrary tables of data and visualize them exactly how you want. Control details of fonts, colors, and tooltips with the power of [Vega](https://vega.github.io/vega/).\n\n* **What\'s possible**: Read the[ launch announcement ‚Üí](https://wandb.ai/wandb/posts/reports/Announcing-the-W-B-Machine-Learning-Visualization-IDE--VmlldzoyNjk3Nzg)\n* **Code**: Try a live example in a[ hosted notebook ‚Üí](https://tiny.cc/custom-charts)\n* **Video**: Watch a quick [walkthrough video ‚Üí](https://www.youtube.com/watch?v=3-N9OV6bkSM)\n* **Example**: Quick Keras and Sklearn [demo notebook ‚Üí](https://colab.research.google.com/drive/1g-gNGokP

Let us chunk each document in the raw data Artifact. We create a new list of dictionaries with the chuked text (`content`) and with `metadata`.

In [12]:
chunked_data = []
for doc in raw_data:
    chunks = split_into_chunks(doc["content"])
    for chunk in chunks:
        chunked_data.append(
            {
                "content": chunk,
                "metadata": {
                    "source": doc["metadata"]["source"],
                    "raw_tokens": len(chunk.split()),
                },
            }
        )

### Cleaning the data

We clean the chunks for special tokens that we find breaks the OpenAI's `client.chat.completions` api. The data cleaning step is crucial for most ML pipelien and even for a RAG/Agentic pipeline. Usually, higher quality chunks provided to an LLM generates a higher quality response.


In [13]:
def make_text_tokenization_safe(content: str) -> str:
    special_tokens_set = {
        "<|endofprompt|>",
        "<|endoftext|>",
        "<|fim_middle|>",
        "<|fim_prefix|>",
        "<|fim_suffix|>",
    }

    def remove_special_tokens(text: str) -> str:
        """Removes special tokens from the given text.

        Args:
            text: A string representing the text.

        Returns:
            The text with special tokens removed.
        """
        for token in special_tokens_set:
            text = text.replace(token, "")
        return text

    cleaned_content = remove_special_tokens(content)
    return cleaned_content

In [14]:
cleaned_data = []
for doc in chunked_data:
    cleaned_doc = doc.copy()
    cleaned_doc["cleaned_content"] = make_text_tokenization_safe(doc["content"])
    cleaned_doc["metadata"]["cleaned_tokens"] = len(
        cleaned_doc["cleaned_content"].split()
    )
    cleaned_data.append(cleaned_doc)
cleaned_data[:2]

[{'content': '--- description: Log and visualize data without a W&B account displayed_sidebar: default --- # Anonymous Mode Are you publishing code that you want anyone to be able to run easily? Use Anonymous Mode to let someone run your code, see a W&B dashboard, and visualize results without needing to create a W&B account first. Allow results to be logged in Anonymous Mode with `wandb.init(`**`anonymous="allow"`**`)` :::info **Publishing a paper?** Please [cite W&B](https://docs.wandb.ai/company/academics#bibtex-citation), and if you have questions about how to make your code accessible while using W&B, reach out to us at support@wandb.com. ::: ### How does someone without an account see results? If someone runs your script and you have to set `anonymous="allow"`: 1. **Auto-create temporary account:** W&B checks for an account that\'s already signed in. If there\'s no account, we automatically create a new anonymous account and save that API key for the session. 2. **Log results qui

Again we will store the cleaned data as an Artifact named `chunked_data`. The metadata that we are logging along with the artifact allows us to go back to it later and be able to reproduce the cleaning steps. It also allows us to pick the version of the `chunked_data` that we are willing to experiment with.

In [15]:
total_raw_tokens = sum(map(lambda x: x["metadata"]["raw_tokens"], cleaned_data))
total_cleaned_tokens = sum(map(lambda x: x["metadata"]["cleaned_tokens"], cleaned_data))

chunked_artifact = wandb.Artifact(
    name="chunked_data",
    type="dataset",
    description="Chunked wandb documentation",
    metadata={
        "total_files": len(cleaned_data),
        "date_processed": datetime.now().strftime("%Y-%m-%d"),
        "total_raw_tokens": total_raw_tokens,
        "total_cleaned_tokens": total_cleaned_tokens,
        "chunk_size": CHUNK_SIZE,
        "chunk_overlap": CHUNK_OVERLAP,
    },
)
with chunked_artifact.new_file("documents.jsonl", mode="w") as f:
    for item in cleaned_data:
        f.write(json.dumps(item) + "\n")
run.log_artifact(chunked_artifact)

<Artifact chunked_data>

## Vectorizing the data

One of the key ingredient of most retrieval system is to represent the given modality (text in our case) as a vector. This vector is a numerical representation representing the "content" of that modality (text). 

Text vectorization (text to vector) can be done using various techniques like [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model), [TF-IDF](https://en.wikipedia.org/wiki/Tf‚Äìidf) (Term Frequency-Inverse Document Frequency), and embeddings like [Word2Vec](https://en.wikipedia.org/wiki/Word2vec), [GloVe](https://nlp.stanford.edu/projects/glove/), and transformer based architectures like BERT and more, which capture the semantic meaning and relationships between words or sentences. 

Below, we are downloading the `cleaned_data` artifact.

In [16]:
chunked_artifact = run.use_artifact(
    f"{WANDB_ENTITY}/{WANDB_PROJECT}/chunked_data:latest", type="dataset"
)
artifact_dir = chunked_artifact.download()
chunked_data_file = pathlib.Path(f"{artifact_dir}/documents.jsonl")
chunked_data = list(map(json.loads, chunked_data_file.read_text().splitlines()))
chunked_data[:2]

2024/07/08 12:36:39 [DEBUG] GET https://storage.googleapis.com/wandb-production.appspot.com/rag-course/dev/vr8n8v06/artifact/942570916/wandb_manifest.json?Expires=1720425999&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=byvrxs3xky2kWaAkrlcGjhCPzdkNLw04EkXxHJfpF041RI%2B%2BhbEIX%2F3x1VsM8%2FK6t5xQsyBrro8Hp4MtnZZbOka%2BmTWrJfE13pOjiY2mU57zJ6du%2FCf6DN671ErceT4jL58W%2BIVKR9fYcvZLxEFRjOWu%2BNZWKIyywDirMK8MqxhSn5BwyWxeo%2FTvTtyWvgW4%2BmKU3ie5b7AYhHRxSWq0NpX6nfN2gf39EM6TL96jkC%2FVmBsNp1NYG9aMURt6Lmrjm282bSq12ZbonokQSzvPk0Zvp3LaYoq9bmayPHcIjfLhY46EsuUN3o4RfM%2BpM1EVPsS4aRw572b%2B1yt04aFIfA%3D%3D


[{'content': '--- description: Log and visualize data without a W&B account displayed_sidebar: default --- # Anonymous Mode Are you publishing code that you want anyone to be able to run easily? Use Anonymous Mode to let someone run your code, see a W&B dashboard, and visualize results without needing to create a W&B account first. Allow results to be logged in Anonymous Mode with `wandb.init(`**`anonymous="allow"`**`)` :::info **Publishing a paper?** Please [cite W&B](https://docs.wandb.ai/company/academics#bibtex-citation), and if you have questions about how to make your code accessible while using W&B, reach out to us at support@wandb.com. ::: ### How does someone without an account see results? If someone runs your script and you have to set `anonymous="allow"`: 1. **Auto-create temporary account:** W&B checks for an account that\'s already signed in. If there\'s no account, we automatically create a new anonymous account and save that API key for the session. 2. **Log results qui

![01_lineage_artifacs_data](../images/01_lineage_artifacts_data.png)

Below we are creating a simple `Retriever` class. This class is responsible for vectorizing the chunks using the `index_data` method and provides a convenient method `search`, for querying the vector index using cosine distance similarity.

- `index_data` will take a list of chunks and vectorize it using TF-IDF and store it as `index`. 

- `search` will take a `query` (question) and vectorize it using the same technique (TF-IDF in our case). It then computes the cosine distance between the query vector and the index (list of vectors) and pick the top `k` vectors from the index. These top `k` vectors represent the chunks that are closest (most relevant) to the `query`.

---

Note that the `Retriever` class is inherited from `weave.Model`. TODO: a bit on pydantic `BaseModel`?

A Model is a combination of data (which can include configuration, trained model weights, or other information) and code that defines how the model operates. By structuring your code to be compatible with this API, you benefit from a structured way to version your application so you can more systematically keep track of your experiments.

To create a model in Weave, you need the following:

- a class that inherits from weave.Model
- type definitions on all attributes
- a typed `predict`, `infer` or `forward` method with `@weave.op()` decorator.

Imagine `weave.op()` to be a shameless use of `print` statement. If you have not initialized a weave run by doing `weave.init`, the code will work as it is without any tracking.

The `predict` method decodated with `weave.op()` will track the model settings along with the inputs and outputs anytime you call it.

In [17]:
import weave

In [18]:
class Retriever(weave.Model):
    vectorizer: TfidfVectorizer = TfidfVectorizer()
    index: list = None
    data: list = None

    @weave.op()
    def index_data(self, data):
        self.data = data
        docs = [doc["cleaned_content"] for doc in data]
        self.index = self.vectorizer.fit_transform(docs)

    @weave.op()
    def search(self, query, k=5):
        query_vec = self.vectorizer.transform([query])
        cosine_distances = cdist(
            query_vec.todense(), self.index.todense(), metric="cosine"
        )[0]
        top_k_indices = cosine_distances.argsort()[:k]
        output = []
        for idx in top_k_indices:
            output.append(
                {
                    "source": self.data[idx]["metadata"]["source"],
                    "text": self.data[idx]["cleaned_content"],
                    "score": 1 - cosine_distances[idx],
                }
            )
        return output

    @weave.op()
    def predict(self, query: str, k: int):
        return self.search(query, k)

Let's see our Retriever in action. We will index our chunked data and then ask a question to retriev related chunks.

We will not be using weave here to show that the code works as it is and that weave is a lightweight wrapper, the benefit of which we will show in the later sections.

In [19]:
retriever = Retriever()
retriever.index_data(chunked_data)

In [20]:
query = "How do I use W&B to log metrics in my training script?"
search_results = retriever.search(query)
for result in search_results:
    print(result)

{'source': 'guides/technical-faq/general.md', 'text': '--- displayed_sidebar: default --- # General ### What does `wandb.init` do to my training process? When `wandb.init()` is called from your training script an API call is made to create a run object on our servers. A new process is started to stream and collect metrics, thereby keeping all threads and logic out of your primary process. Your script runs normally and writes to local files, while the separate process streams them to our servers along with system metrics. You can always turn off streaming by running `wandb off` from your training directory, or setting the `WANDB_MODE` environment variable to `offline`. ### Does your tool track or store training data? You can pass a SHA or other unique identifier to `wandb.config.update(...)` to associate a dataset with a training run. W&B does not store any data unless `wandb.save` is called with the local file name. ### What formula do you use for your smoothing algorithm? We use the s

## Generating a response

There are two components of any RAG pipeline - a `Retriever` and a `ResponseGenerator`. Earlier, we designed a simple retriever. Here we are designing a simple `ResponseGenerator`. 

The `generate_response` method takes the user question along with the retrieved context (chunks) as inputs and makes a LLM call using the `model` and `prompt` (system prompt). This way the generated answer is grounded on the documentation (our usecase). In this course we are using Cohere's `command-r` model.

As earlier, we have wrapped this `ResponseGenerator` class with weave for tracking the inputs and the output.

In [21]:
class ResponseGenerator(weave.Model):
    model: str
    prompt: str
    client: cohere.Client = None
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.client = cohere.Client(api_key=os.environ["CO_API_KEY"])

    @weave.op()
    def generate_context(self, context: List[Dict[str, any]]) -> str:
        return [{"source": item['source'], "text": item['text']} for item in context]
    
    @weave.op()
    def generate_response(self, query: str, context: List[Dict[str, any]]) -> str:
        contexts = self.generate_context(context)
        response = self.client.chat(
            preamble=self.prompt,
            message=query,
            model=self.model,
            documents=contexts,
            temperature=0.1,
            max_tokens=2000,
        )
        return response.text

    @weave.op()
    def predict(self, query: str, context: List[Dict[str, any]]):
        return self.generate_response(query, context)

Below is the system prompt. Consider this to be set of instructions on what to do with the user question and the retrieved contexts. In practice, the system prompt can be very detailed and involved (depending on the usecase) but we are showing a simple prompt. Later we will iterate on it and show how improving the system prompt improves the quality of the generated response.

In [22]:
PROMPT = """
Answer to the following question about W&B. Provide an helful and complete answer based only on the provided documents.
"""

Let's generate the response for the question "How do I use W&B to log metrics in my training script?". We have already retrieved the context in the previous section and passing both the question and the context to the `generate_response` method.

In [23]:
response_generator = ResponseGenerator(model="command-r", prompt=PROMPT)
answer = response_generator.generate_response(query, search_results)
print(answer)

You can use the W&B to log metrics in your training script by following these steps:
1. First, import wandb at the top of your training script:
```python
import wandb
```
2. Then, call `wandb.init()` at the beginning of your training script. This creates a run object on W&B servers and starts a new process to stream and collect metrics.
3. Your script runs as normal and writes to local files, while the separate process streams them to the W&B servers along with system metrics. 
4. To log a metric, call `wandb.log()` at the relevant point in your script. For example, you might call this every time the accuracy changes.

Here is an example of logging two metrics, 'val/loss' and 'val/acc':
```python
wandb.log({"val/loss": 1.1, "val/acc": 0.3})
```
You can also log images, videos and data tables using specific W&B functions such as `wandb.Image()`, `wandb.Video()` and `wandb.Table().()`

Make sure to call `wandb.finish()` at the end of your training script to conclude the W&B run. You don'

## Simple Retrieval Augmented Generation (RAG) Pipeline

Below we are bringing everything together. As stated a simple RAG pipeline constitute of a retriever and a response generator. 

The `__call__` method is calling the `predict` method which is decorated with `weave.op()`. It takes the user query, retrieves relevant context using the retriever and finally synthesize a response grounded on the data source (documentation in our case).

In [24]:
class RAGPipeline(weave.Model):
    retriever: Retriever = None
    response_generator: ResponseGenerator = None
    top_k: int = 5

    @weave.op()
    def predict(self, query: str):
        context = self.retriever.predict(query, self.top_k)
        return self.response_generator.predict(query, context)

Let us initialize the `RAGPipeline`.

In [25]:
# Initialize retriever
retriever = Retriever()
retriever.index_data(chunked_data)

# Initialize the response generator
response_generator = ResponseGenerator(model="command-r", prompt=PROMPT)

# Bring them together as a RAG pipeline
rag_pipeline = RAGPipeline(
    retriever=retriever,
    response_generator=response_generator,
    top_k=5
)

We are finally ready to use `weave.init(<project-name>)` to see what we get from using W&B Weave. Once intialized, we will start tracking the inputs and the outputs along with underlying attributes (model name, top_k, etc.).

In [26]:
weave.init(f"{WANDB_ENTITY}/{WANDB_PROJECT}")

weave version 0.50.7 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: parambharat.
View Weave data at https://wandb.ai/rag-course/dev/weave


<weave.weave_client.WeaveClient at 0x7d70fd3ff190>

In [27]:

response = rag_pipeline.predict("How do I get get started with wandb?")
print(response, sep="\n")

üç© https://wandb.ai/rag-course/dev/r/call/c1bdd7af-8ec2-46c9-837e-7350d030121c
To get started with wandb.init(), first import wandb into your training script and then call `wandb.init()` which <co: 3>creates a run object on W&B servers and streams metrics from a separate process so as not to slow down your primary process. Your script will run as normal while the separate process streams the metrics to the servers. 

You can also use W&B with Jupyter Notebooks. To get started, first install W&B and link your account in your notebook with the following code:
```notebook 
!pip install wandb -qqq
import wandb
wandb.login()
```
Then, set up your experiment and save your hyperparameters using `wandb.init()`. For example:
```python
wandb.init(
    project="jupyter-projo",
    config={
        "batch_size": 128,
        "learning_rate": 0.01,
        "dataset": "CIFAR-100",
    },
)
```


Click on the link starting with a üç©. This is the trace timeline for all the executions that happened in our simple RAG application. Go to the link and drill down to find everything that got tracked.

![weave trace timeline](../images/01_weave_trace_timeline.png)

In [28]:
# TODO: Add exercise for chapter 1.