# 2. Building your first RAG

This notebook provides a hands-on first experience on **Retrieval-Augmented Generation (RAG)** using the `google-genai` Python SDK, configured for **Vertex AI**.

**Why RAG?**
*   **Reduces Hallucinations:** Provides factual context.
*   **Uses Up-to-Date Information:** Accesses newer data.
*   **Domain-Specific Knowledge:** Handles private/specialized docs.

**The Core RAG Process:**
1.  **Query:** User asks a question.
2.  **Retrieve:** System finds relevant documents.
3.  **Augment:** Query + Retrieved Docs = New Prompt.
4.  **Generate:** LLM answers using the augmented prompt.

Let's build a basic RAG flow!

## Setup


In [None]:
from google import genai
from google.genai import types
import numpy as np
import textwrap 

print(f"google-genai SDK imported successfully.")

Next, configure the client for Vertex AI. 

**Note:** This assumes your environment (e.g., a Vertex AI Notebook, or a local environment where you've run `gcloud auth application-default login`) is already authenticated to use Google Cloud services. Execution will fail below if Project ID or Location are incorrect or if authentication is missing.

In [None]:
# --- TODO: Configure these values for your environment --- 
PROJECT_ID = "..." # Your Google Cloud Project ID
LOCATION = "..."   # The region for Vertex AI services, e.g., "us-central1"
# -----------------------------------------------------

# Attempt to initialize the client - this will fail if config is invalid/missing
print(f"Configuring client for Project: {PROJECT_ID}, Location: {LOCATION}")
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
print("✅ google-genai client configured for Vertex AI.")


## Step 1: Data Preparation & Indexing

We need a knowledge base. We'll convert these texts into **embeddings**.

> #### Exercise 📝
> Choose an appropriate embedding model name available on Vertex AI. See [Vertex AI documentation for text embeddings](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings#text_embedding_models). If the name is invalid, the next cell will fail.

In [None]:
# Our simple knowledge base
documents = [
    "Mercury is the smallest planet in our Solar System and nearest to the Sun.",
    "Venus is the second planet from the Sun and is Earth's closest planetary neighbor.",
    "Earth is the third planet from the Sun and the only astronomical object known to harbor life.",
    "Mars is the fourth planet from the Sun and the second-smallest planet, often called the 'Red Planet'.",
    "Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant.",
    "Saturn is the sixth planet from the Sun, famous for its prominent ring system. It's another gas giant.",
    "Uranus is the seventh planet from the Sun. It has a unique tilt, making it rotate nearly on its side.",
    "Neptune is the eighth and farthest known planet from the Sun. It's a dark, cold world."
]

# --- TODO: Specify the embedding model name --- 
embedding_model_name = "..."
# --------------------------------------------

print(f"Using embedding model: {embedding_model_name}")

Generate embeddings. This cell will fail if the client/model is not set up correctly, or if the API call fails, or if the response doesn't contain `['embedding']`.

In [None]:
document_embeddings = [] # To store {text:, embedding:} dictionaries

print(f"Generating embeddings for {len(documents)} documents...")
    
# Call embed_content - Errors will now halt execution and display here
response: types.EmbedContentResponse = client.models.embed_content(
    model=embedding_model_name, 
    contents=documents 
)

# Directly access the embeddings - this assumes response['embedding'] exists and is a list
embeddings_list: list[list[float]] = response.embeddings

# Store embeddings along with the original text
for doc, embedding in zip(documents, response.embeddings):
    document_embeddings.append({
        "text": doc,
        # Convert to numpy array for easier math later
        "embedding": np.array(embedding.values) 
    })
    
print(f"✅ Successfully generated and processed {len(document_embeddings)} embeddings.")
# print(f"Sample embedding dimension: {document_embeddings[0]['embedding'].shape}")


Our 'index' is now ready. Real systems use specialized **vector databases** (like Vertex AI Matching Engine) for efficiency.

## Step 2: Retrieval

To find relevant documents for a user query:
1.  Embed the query.
2.  Calculate similarity.
3.  Select top-k.

> #### Exercise 📝
> Complete the `calculate_cosine_similarity` function using numpy operations (`np.dot`, `np.linalg.norm`).

In [None]:
from typing import List

def embed_query(query_text: str) -> np.ndarray:
    """Generates an embedding for a single query string."""
    # Call embed_content for the query - will fail if client/model invalid
    response: types.EmbedContentResponse = client.models.embed_content(
        model=embedding_model_name,
        contents=query_text 
    )
    # Directly access the embedding - assumes response['embedding'] exists and is a list
    return np.array(response.embeddings[0].values) # Convert to numpy array for easier math later

def calculate_cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
    """Calculates the cosine similarity between two numpy vectors."""
    # Convert just in case they are not arrays yet
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    
    # --- TODO: Implement Cosine Similarity --- 
    dot_product: float = ... # Dot product of vec1 and vec2
    norm_vec1: float = ... # Magnitude (norm) of vec1
    norm_vec2: float = ... # Magnitude (norm) of vec2
    # ---------------------------------------
    
    # Check for zero vectors to avoid division by zero
    if norm_vec1 == 0 or norm_vec2 == 0:
        return 0.0 
    
    similarity: float = dot_product / (norm_vec1 * norm_vec2)
    return similarity

def retrieve_documents(query: str, num_documents_to_retrieve: int = 3) -> List[str]:
    """Retrieves the top_k most relevant documents for a given query."""
    # This will fail if document_embeddings list is empty or not defined
    # This will fail if embed_query fails
    print(f"\nRetrieving documents for query: '{query}'")
    query_embedding: np.ndarray = embed_query(query)

    similarities: List[tuple[float, str]] = []
    # This will fail if calculate_cosine_similarity fails (e.g., if query_embedding was None - though embed_query should have failed first)
    for doc_data in document_embeddings:
        similarity: float = calculate_cosine_similarity(query_embedding, doc_data['embedding'])
        similarities.append((similarity, doc_data['text']))

    # Sort by similarity 
    similarities.sort(key=lambda item: item[0], reverse=True)

    # Slice for top k - will fail if num_documents_to_retrieve is not int > 0
    retrieved_texts: List[str] = [text for similarity, text in similarities[:num_documents_to_retrieve]]
    print(f"Retrieved {len(retrieved_texts)} documents.")
    return retrieved_texts

Let's test retrieval.

> #### Exercise 📝
> Choose how many documents (`top_k`) you want to retrieve. If k is invalid, the slicing in `retrieve_documents` might fail or behave unexpectedly.

In [None]:
user_query = "Which planet is known for its rings?"
# --- TODO: Set the number of documents to retrieve --- 
k = ... # e.g., 2 or 3
retrieve_documents(user_query, num_documents_to_retrieve=k)

That was cool! We just retrieved the most relevant documents for our query.

Let's try another couple queries:

In [None]:
retrieve_documents("chocolate", num_documents_to_retrieve=2)

In [None]:
retrieve_documents("water", num_documents_to_retrieve=2)

In [None]:
retrieve_documents("What is the smallest planet to orbit the sun in our solar system?", num_documents_to_retrieve=2)

In [None]:
retrieve_documents("<your question>", num_documents_to_retrieve=2)

> #### Exercise 📝
> Why does each query return what it returns?

## Step 3: Augmentation & Generation

Combine query and retrieved documents into an augmented prompt.

> #### Exercise 📝
> Choose a generative model available on Vertex AI. See [Vertex AI documentation for Gemini models](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini_models).

In [None]:
retrieved = retrieve_documents(
    "Which planet is known for its rings?",
    num_documents_to_retrieve=2
)

In [None]:
# --- TODO: Specify the generation model name --- 
generation_model_name = "..." # e.g., "gemini-1.5-flash-001"
# -----------------------------------------------

print(f"Using generation model: {generation_model_name}")

# Construct the augmented prompt - Assumes 'retrieved' is a list of strings from previous cell
context = "\n".join(retrieved) 
augmented_prompt = f"Based ONLY on the following information:\n--- START CONTEXT ---\n{context}\n--- END CONTEXT ---\n\nAnswer the question: {user_query}"
    
print("\n--- Augmented Prompt ---")
print(textwrap.fill(augmented_prompt, width=80))
print("\n--- Generating Response ---")

# Call generate_content - Will fail if client/model invalid or API error
response = client.models.generate_content(
    model=generation_model_name,
    contents=augmented_prompt,
    # Optional config: 
    # config=types.GenerateContentConfig(temperature=0.2) 
)
        
# Directly access the response text - Assumes response has .text attribute
print("\nModel Response:")
print(textwrap.fill(response.text, width=80))


## Step 4: Putting It All Together

Let's wrap the RAG process in a single function (still with minimal checks).

In [None]:
def perform_rag(query, num_docs=3):
    """Performs the full RAG process: Retrieve -> Augment -> Generate."""
    
    # 1. Retrieve - Will fail if retrieval fails internally
    retrieved_docs = retrieve_documents(query, num_documents_to_retrieve=num_docs)
            
    print("--- Retrieved for RAG Function ---       ")
    for i, doc in enumerate(retrieved_docs):
        print(f"{i+1}. {textwrap.fill(doc, width=80)}")
        
    # 2. Augment
    context = "\n".join(retrieved_docs)
    augmented_prompt = (
        f"Use the following pieces of context to answer the question at the end. "
        f"If you don't know the answer from the context, just say that you don't know, don't try to make up an answer.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\n\nAnswer:"
    )
    
    print("\n--- Augmented Prompt (RAG Function) ---")
    print(textwrap.fill(augmented_prompt, width=80))
    
    # 3. Generate - Will fail if generation fails
    print("\n--- Generating Response (RAG Function) ---")
    response = client.models.generate_content(
        model=generation_model_name,
        contents=augmented_prompt,
    )
    
    # Directly access text - Assumes .text attribute exists
    final_answer = response.text
    print("\nModel Response:")
    print(textwrap.fill(final_answer, width=80))
    return final_answer

> #### Exercise 📝
> Ask the RAG system a question. If the query is invalid or causes issues, the `perform_rag` function will likely fail.

In [None]:
# --- TODO: Ask a question --- 
final_query = "..." # e.g., "Describe the largest planet mentioned."
# ---------------------------

final_answer = perform_rag(final_query, num_docs=2) 
print("\n--- RAG Process Completed ---       ")



> #### 🎁 Bonus exercises 📝
> - **Different Data:** Replace the planet documents with your own text data.
> - **Vector Databases:** Learn about [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview), Pinecone, ChromaDB, or FAISS. Why are they needed for scale?
> - **Generation:** Explore different generative models. How do they differ in performance? Select different models in the `perform_rag` function.