***
<span style="font-size:32px; color:rgba(0, 0, 255, 0.5);">Day 2 - Embeddings & Vector Stores/Databases</span>

---

<table style="width: 100%;">
  <tr>
    <td style="background-color: rgba(0, 255, 0, 0.2); text-align: center; font-size: 16px;">
    </td>
  </tr>
</table>

<span style="font-size:24px; color:rgba(0, 0, 0, 0.5);">Document Q&A with RAG using Chroma</span>

---
"Modern machine learning thrives on diverse data—images, text, audio, and more. This whitepaper explores the power of embeddings, which transform this heterogeneous data into a unified vector representation for seamless use in various applications.

Why embeddings are important In essence, embeddings are numerical representations of real-world data such as text, speech, image, or videos. They are expressed as low-dimensional vectors where the geometric distances of two vectors in the vector space is a projection of the relationships between the two real-world objects that the vectors represent. In other words they help you with providing compact representations of data of different types, while simultaneously also allowing you to compare two different data objects and tell how similar or different they are on a numerical scale: for example: The word ‘computer’ has a similar meaning to the picture of a computer, as well as the word ’laptop’ but not to the word ‘car’. These low-dimensional numerical representations of real-world data significantly helps efficient large-scale data processing and storage by acting as means of lossy compression of the original data while retaining its important properties."

<b>Authors:</b><br>
Anant Nawalgaria and Xiaoqi Ren

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Resources</span>

---
**Whitepaper**<br>
https://www.kaggle.com/whitepaper-embeddings-and-vector-stores

**Embedding and Vector Stores Podcast**<br>
https://www.youtube.com/watch?v=1CC39K76Nqs

**Embedding and Vector Databases Livestream**<br>
https://www.youtube.com/watch?v=kpRyiJUUFxY

**Get your API key from**<br>
https://aistudio.google.com/app/apikey

**Kaggle**<br>
https://www.kaggle.com/code/markishere/day-2-document-q-a-with-rag

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Retrieval Augmented Generation (RAG) - Use case example</span>

---
<b>Imagine this:</b> Employees no longer have to call, email, or hunt through HR handbooks to get answers about policies, benefits, or procedures. Instead, they can ask a chatbot and instantly get accurate, helpful responses. This would free up an HR team’s time and allow them to focus on bigger priorities.

Here’s how it works:

1. Understanding Your Documents: All your HR materials—handbooks, policies, guides—are analyzed and broken down into smaller, easy-to-digest sections (like bite-sized chunks).<br><br>
2. Making the Chatbot Smart: These chunks are converted into a special, searchable format that helps the AI understand the meaning of each section, not just the words. Think of it like creating a quick-reference map for all your documents.<br><br>
3. Fast, Accurate Answers: When an employee asks the chatbot a question, the system compares the question to the document map and finds the best, most relevant answer in seconds.<br><br>
4. Keeping Things Updated: Since HR documents change frequently, the system stays up-to-date by:<br><br>
   <ul>
       <li>
           <b>Scheduling Regular Updates:</b> Reprocessing all documents periodically.
       </li><br>
       <li>
           <b>Real-Time Monitoring:</b> Automatically updating the system when changes are detected.
       </li><br>
       <li>
           <b>Tagging Versions:</b> Keeping both current and historical versions of information if needed.
       </li><br>
   </ul>
This approach ensures employees always get accurate answers, even as policies or procedures evolve. It’s like having an HR assistant on call 24/7—efficient, accurate, and stress-free for everyone involved.

<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">How are vectors compared?</span>

- Vectors are arrays of numbers, like [0.1, 0.3, 0.5], where each value represents a specific feature or aspect of the data.
- The goal is to measure the "distance" or "angle" between two vectors to assess how related they are.

<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Simple RAG Diagram - HR Example</span>
![Alt Text](./images/hr_rag_dia.png)

In [1]:
# %pip install google-generativeai

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Libraries</span>

---

In [2]:
import os, chromadb

from dotenv import load_dotenv

import google.generativeai as genai
from google.api_core import retry

from IPython.display import Markdown

from chromadb import Documents, EmbeddingFunction, Embeddings

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Initialize the API</span>

---

In [3]:
# Load API key from .env file
load_dotenv()
api_key = os.getenv("GAI_API_KEY")

# Set up the API key for the genai library
genai.configure(api_key=api_key)

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Explore available models</span>

---

In [4]:
for m in genai.list_models():
    if "embedContent" in m.supported_generation_methods:
        print(m.name)

models/embedding-001
models/text-embedding-004


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Data</span>

---

Here is a small set of documents you will use to create an embedding database.

In [5]:
DOCUMENT1 = "Operating the Climate Control System  Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console.  Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it."
DOCUMENT2 = 'Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.'
DOCUMENT3 = "Shifting Gears Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions."

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Creating the embedding database with ChromaDB</span>

---
Create a custom function to generate embeddings with the Gemini API. In this task, you are implementing a retrieval system, so the task_type for generating the document embeddings is retrieval_document. Later, you will use retrieval_query for the query embeddings. Check out the API reference for the full list of supported tasks.

In [6]:
class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        retry_policy = {"retry": retry.Retry(predicate=retry.if_transient_error)}

        response = genai.embed_content(
            model="models/text-embedding-004",
            content=input,
            task_type=embedding_task,
            request_options=retry_policy,
        )
        return response["embedding"]

Now create a Chroma database client that uses the GeminiEmbeddingFunction and populate the database with the documents you defined above.

In [7]:
DB_NAME = "googlecardb"
embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documents, ids=[str(i) for i in range(len(documents))])

Confirm that the data was inserted by looking at the database.

In [8]:
db.count()
# You can peek at the data too.
# db.peek(1)

3

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Retrieval: Find relevant documents</span>

---
Retrieval: Find relevant documents
To search the Chroma database, call the query method. Note that you also switch to the retrieval_query mode of embedding generation.

In [9]:
# Switch to query mode when generating embeddings.
embed_fn.document_mode = False

# Search the Chroma DB using the specified query.
query = "How do you use the touchscreen to play music?"

result = db.query(query_texts=[query], n_results=1)
[[passage]] = result["documents"]

Markdown(passage)

Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Augmented generation: Answer the question</span>

---
Now that you have found a relevant passage from the set of documents (the retrieval step), you can now assemble a generation prompt to have the Gemini API generate a final answer. Note that in this example only a single passage was retrieved. In practice, especially when the size of your underlying data is large, you will want to retrieve more than one result and let the Gemini model determine what passages are relevant in answering the question. For this reason it's OK if some retrieved passages are not directly related to the question - this generation step should ignore them.

In [10]:
passage_oneline = passage.replace("\n", " ")
query_oneline = query.replace("\n", " ")

# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. 
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: {query_oneline}
PASSAGE: {passage_oneline}
"""
print(prompt)

You are a helpful and informative bot that answers questions using text from the reference passage included below. 
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: How do you use the touchscreen to play music?
PASSAGE: Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.



Now use the generate_content method to to generate an answer to the question.

In [11]:
model = genai.GenerativeModel("gemini-1.5-flash-latest")
answer = model.generate_content(prompt)
Markdown(answer.text)

To play music on your Googlecar's touchscreen, simply touch the "Music" icon; it's that easy!  The touchscreen is the large display that shows all the car's features, including navigation and climate control, so you just tap the icon to start playing your music.


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Next Steps</span>

---
To learn more about using embeddings in the Gemini API, check out the Intro to embeddings or to learn more fundamentals, study the embeddings chapter of the Machine Learning Crash Course.

For a hosted RAG system, check out the Semantic Retrieval service in the Gemini API. You can implement question-answering on your own documents in a single request, or host a database for even faster responses.