## Document Q&A with RAG

### Background information
#### Why RAG?
Two big limitations of LLM are: 1) they only know the information they are trained on; 2) they have limited input context windows. To address that, RAG is used to provide extra information and then retrieve that information when needed. 
#### What is RAG?
RAG is Retrieval Augmented Generation. It has three stages: indexing, retrieval, and generation. 
- indexing happens ahead of time and allows you to quickly look up relevant information at query-time. 
- when a query comes, you retrieve relevant documents, combine them with your instructions and the user's query, and have the LLM generated a tailored answer in natural language. This allows you to provide information that the model hasn't seen before. 

### Goal of this notebook
We will use the Gemini API to create a vector database, retrieve answers to questions from the database and generate a final answer. You will use [Chroma](https://docs.trychroma.com/), an open-source vector database. With Chroma, you can store embeddings alongside metadata, embed documents and queries, and search your documents. 

### Step 1: set up

In [None]:
from google import genai
from google.genai import types

from IPython.display import Markdown

genai.__version__

In [None]:
## Set up API key
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

### Explore available models


In [None]:
client = genai.Client(api_key=GOOGLE_API_KEY)

from m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

### Data

In [None]:
DOCUMENT1 = "Operating the Climate Control System  Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console.  Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it."
DOCUMENT2 = 'Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.'
DOCUMENT3 = "Shifting Gears Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions."

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

### Create the embedding database with ChromaDB
Create a custom function to generate embeddings with the Gemini API.

In [None]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

from google.genai import types

# Define a helper to retry when per-minute quota is reached
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

class GeminiEmbeddingFunction(EmbeddingFunction):
    document_mode = True
    
    @retry.Retry(predicate=is_retriable)
    def __call__(self, input:Documents) --> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentconfig(
                task_type=embedding_task,
            )
        )

In [None]:
### Create a Chroma database client that uses the Gemini embedding function
import chromadb

DB_NAME = "googlecardb"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documents, ids=[str(i) for i in range(len(documents))])

### Retrieval: Fina relevant documents

In [None]:
# switch to query model when generating embeddings
embed_fn.document_mode = False

# search the Chroma DB using the specified query
query = "How do I use the touchscreen to play music?"

result = db.query(query_texts=[query], n_results=1)
[all_passages] = result["documents"]


### Augmented generation: answer the question
Now that you have found a relevant passage from the set of documents (retrieval), you can now assemble a generation prompt to have the Gemini API generate a final answer. 

In [None]:
query_online = query.replace("\n", " ")

# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. 
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: {query_oneline}
"""

# Add the retrieved documents to the prompt: 
for passage in all_passages: 
    passage_online = passage.replace("\n", " ")
    prompt += f"\n\nPASSAGE: {passage_online}"

print(prompt)

### Generate the answer

In [None]:
answer = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt
)