# Chapter 4

<a target="_blank" href="https://colab.research.google.com/github/wandb/edu/blob/e98981ef9d934b10a0c6b4855d8eb6bfc7f56f1a/rag-advanced/notebooks/Chapter04.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<!--- @wandbcode{rag-advance-chapter-04} -->

## Query Enhancement

Improving the quality of data helps with improving the quality of generated response. Another way is to improve the quality of the query seen by the LLM.

We cannot ask the user to provide the query in the best way possible. Many a times the user is not very sure of the query to be asked. Query enhancement as the name suggests, is an intermediate step that uses LLM to enhance the quality of the query. The enhancement can be - 
- making the query gramatically correct
- breaking down a complex query into relevant sub-queries
- extract the intent of the query (this can be passed for formatted answer in case of nefarious queries)
- if you have a chat history, augment the query with past queries and generated answers/retrieved contexts.
- extract keywords (can be about your product or anything related to your application) and pass it along with the query to your LLM

One can imagine many different ways to enhance the quality of the query or extract meaningful stuff from a query.

To begin, execute the following cell to clone the repository and install dependencies:

In [None]:
!git clone https://github.com/wandb/edu.git
%cd edu
!git checkout rag-irl

%cd rag-advanced
!pip install -qqq -U uv
!uv pip install --system -r requirements.txt
%cd notebooks

With the setup complete, we can now proceed with the chapter content.

Initial steps:
1. Log in to Weights & Biases (W&B)
2. Configure environment variables for API access

To obtain your Cohere API key, visit the [Cohere API dashboard](https://dashboard.cohere.com/api-keys).

In [None]:
import getpass
import os

import wandb

os.environ["COHERE_API_KEY"] = getpass.getpass("Please enter your COHERE_API_KEY")
wandb.login()

In [None]:
%load_ext autoreload
%autoreload 2

import asyncio
import cohere
import weave

In [None]:
WANDB_ENTITY = "rag-course"
WANDB_PROJECT = "dev"

weave_client = weave.init(f"{WANDB_ENTITY}/{WANDB_PROJECT}")

We will download the chunked data from chapter 3. This chunking was done using semantic chunking strategy.

In [None]:
# Reload the data from Chapter 3
chunked_data = weave.ref("chunked_data:v0").get()

chunked_data.rows[:2]

In our usecase we will use this query enhancement stage to -
- identify the language of the query (our documentation in in English, Japanese and Korean and we want to answer in the language of the query)
- indentify the intent of the query (a user might ask something that is not related to our documentation)
- generate sub-queries (break down the main query into smaller queries) for retrieving more contexts for our LLM.

These additional informations will be used to inform the response generator and improve the retrieval process.

In [None]:
from scripts.query_enhancer import QueryEnhancer
from scripts.utils import display_source

query_enhancer = QueryEnhancer()

In [None]:
response = await query_enhancer.predict("How do I log images in lightning with wandb?")

Look at the response below:

- we identified the query to be in English.
- We derived few sub-queries that make sense.
- We classified the intent based on our intent classification "prompt/guides"

In [None]:
response

Our retriever will remain the same. Yes we have 5 sub-queries that we want to retrieve for but we can do so one by one. 

Let us use our BM25 based retriever from Chapter 2 and index our chunked data.

In [None]:
from scripts.retriever import BM25Retriever

retriever = BM25Retriever()
retriever.index_data(chunked_data.rows)

Since we have more information extracted from our query - like the language and the intent of the query, we write `QueryEnhanedResponseGenerator` whihc uses a new system prompt augmented with language and intent information.

Look at line 24.

In [None]:
from scripts.response_generator import QueryEnhanedResponseGenerator

display_source(QueryEnhanedResponseGenerator)

The `QueryEnhancedRAGPipeline` runs through different `search_queries` or sub-queries and retrieve the chunks. It also deduplicate the chunks so that we don't end up sending the same chunk twice.

Note line 23-27. We check if the extracted intent is not in a list of intents to avoid. If that's the case, we do not do retrieval and can return a formatted answer like - "This query is not related to Weights and Biases. Can you please ask again?"

In [None]:
from scripts.rag_pipeline import QueryEnhancedRAGPipeline

display_source(QueryEnhancedRAGPipeline)

Let us initialize the response generator and our RAG pipeline and run in on one query.

In [None]:
# lets add the new prompt
QUERY_ENHANCED_PROMPT = open("prompts/query_enhanced_system.txt").read()

response_generator = QueryEnhanedResponseGenerator(
    model="command-r", prompt=QUERY_ENHANCED_PROMPT, client=cohere.AsyncClient()
)

In [None]:
query_enhanced_rag_pipeline = QueryEnhancedRAGPipeline(
    query_enhancer=query_enhancer,
    retriever=retriever,
    response_generator=response_generator,
    top_k=5,
)

response = await query_enhanced_rag_pipeline.predict(
    "How do I log images in lightning with wandb?"
)
from IPython.display import Markdown

Markdown(response)

## Evaluate and Compare

In [None]:
eval_dataset = weave.ref(
    "weave:///rag-course/dev/object/Dataset:Qj4IFICc2EbdXu5A5UuhkPiWgxM1GvJMIvXEyv1DYnM"
).get()

print(eval_dataset.rows[:2])

In [None]:
# Let also initialize the baseline RAG pipeline from chapter 3

from scripts.rag_pipeline import SimpleRAGPipeline
from scripts.response_generator import SimpleResponseGenerator

INITIAL_PROMPT = open("prompts/initial_system.txt", "r").read()
response_generator = SimpleResponseGenerator(model="command-r", prompt=INITIAL_PROMPT)
simple_rag_pipeline = SimpleRAGPipeline(
    retriever=retriever, response_generator=response_generator, top_k=5
)

In [None]:
# Here we are primarly interested in evaluating the response quality since we are using the same retriver in both pipelines
# We will use LLM metrics to evaluate the response quality.

In [None]:
from scripts.response_metrics import LLM_METRICS

response_evaluations = weave.Evaluation(
    name="Response_Evaluation",
    dataset=eval_dataset,
    scorers=LLM_METRICS,
    preprocess_model_input=lambda x: {"query": x["question"]},
)

baseline_response_scores = asyncio.run(
    response_evaluations.evaluate(simple_rag_pipeline)
)

query_enhanced_response_scores = asyncio.run(
    response_evaluations.evaluate(query_enhanced_rag_pipeline)
)

![compare_retriever_responses](../images/04_compare_query_enhanced_responses.png)