# Capstone Overview


## Setup

First, install ChromaDB and the Gemini API Python SDK for RAG and LLM access.
Also setup the API Key.

In [6]:
!pip uninstall -qqy jupyterlab kfp  # Remove unused conflicting packages
!pip install -qU "google-genai==1.7.0" "chromadb==0.6.3"

In [7]:
from google import genai
from google.genai import types

from IPython.display import Markdown

genai.__version__

'1.7.0'

In [8]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

Set up a retry helper. This allows you to "Run all" without worrying about per-minute quota.

In [None]:
#from google.api_core import retry


#is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

#genai.models.Models.generate_content = retry.Retry(
#    predicate=is_retriable)(genai.models.Models.generate_content)

## Explore available models

### LLM Model


### Embedding Model

We will be using the [`embedContent`](https://ai.google.dev/api/embeddings#method:-models.embedcontent) API method to calculate embeddings of documents before saving to the database. 

- `text-embedding-004` is the most recent generally-available embedding model
- experimental `gemini-embedding-exp-03-07` model
- more info: [`models.list`](https://ai.google.dev/api/models#method:-models.list) / [the models page](https://ai.google.dev/gemini-api/docs/models/gemini#text-embedding).

In [9]:
client = genai.Client(api_key=GOOGLE_API_KEY)

for m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

models/embedding-001
models/text-embedding-004
models/gemini-embedding-exp-03-07
models/gemini-embedding-exp


## Data

Loading a list of documents we can use to create an embedding database.

### Public Gensler Thought Leadership
Markdown articles sourced from Gensler.com dialogue blog 2025<br>
https://www.kaggle.com/datasets/junwangzero/public-gensler-thought-leadership

In [10]:
import os

input_path = "/kaggle/input"
all_files = []

# Collect all file paths from all attached datasets
for dataset in os.listdir(input_path):
    dataset_path = os.path.join(input_path, dataset)
    for root, dirs, files in os.walk(dataset_path):
        for file in files:
            all_files.append(os.path.join(root, file))

# Print total count and the list
print(f"Total number of files: {len(all_files)}\n")
for file_path in all_files:
    print(file_path)

Total number of files: 20

/kaggle/input/public-gensler-thought-leadership/The New Experiential Hybrid.md
/kaggle/input/public-gensler-thought-leadership/The Biggest Challenge to Office Conversions Isnt Design  Its the Status Quo.md
/kaggle/input/public-gensler-thought-leadership/Realizing the Value of Artificial Intelligence When Planning Tomorrows Healthcare Facilities.md
/kaggle/input/public-gensler-thought-leadership/Prototyping the Hospital of the Future.md
/kaggle/input/public-gensler-thought-leadership/Severe Weather The New Design Imperative.md
/kaggle/input/public-gensler-thought-leadership/The Uncertainty of Electric Power A View from Houston.md
/kaggle/input/public-gensler-thought-leadership/Biodiversity as the New Frontier for Achieving Resilience in Real Estate.md
/kaggle/input/public-gensler-thought-leadership/Avoiding Fear-Based Decision-Making to Foster Positive In-Store Retail Experiences.md
/kaggle/input/public-gensler-thought-leadership/Live Event Amenities The Next 

In [11]:
# helper function to parse input markdown files
def read_markdown(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        return f.read()

# Read all markdown files into a list of strings
documents = [read_markdown(path) for path in all_files]
print(documents[0][:500])  # Preview first 500 characters of the first doc

---
title: "The New Experiential Hybrid"
source: "https://www.gensler.com/blog/the-new-experiential-hybrid"
author:
  - "[[Gensler]]"
published:
created: 2025-04-09
description: "Gensler is a global architecture, design, and planning firm with 57 offices and 6,000+ professionals across the Americas, Europe, Greater China, and APME."
tags:
  - "clippings"
---
![A street with cars and buildings.](https://static2.gensler.com/uploads/image/96476/Sportsmens_Lodge_N7_1738969737.jpg)

The Sportsmen’s L


## Creating the embedding database with ChromaDB

Next we create a [custom function](https://docs.trychroma.com/guides/embeddings#custom-embedding-functions) to generate embeddings with the Gemini API. 

In our first step, we are implementing a retrieval system, so the `task_type` for generating the *document* embeddings is `retrieval_document`. Later, we will use `retrieval_query` for the *query* embeddings. 

More supported tasks: [API reference](https://ai.google.dev/api/embeddings#v1beta.TaskType) 

In [15]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

from google.genai import types


# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True
    # Specify embedding model
    embedding_model = "models/text-embedding-004"

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model=self.embedding_model,
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

Now we create a [Chroma database client](https://docs.trychroma.com/getting-started) that uses the `GeminiEmbeddingFunction` and populate the database with the documents we loaded above.

In [16]:
import chromadb

DB_NAME = "genslerthoughtdb"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documents, ids=[str(i) for i in range(len(documents))])

Confirm that the data was inserted by looking at the database.

In [19]:
db.count()

20

In [20]:
# We can peek at the data too.
db.peek(1)

{'ids': ['0'],
 'embeddings': array([[ 2.51599457e-02, -1.66053530e-02,  1.25226416e-02,
         -2.94276178e-02, -1.31394491e-02,  1.30509231e-02,
          2.03051642e-02,  1.77661597e-03, -2.76266667e-03,
          3.34638134e-02, -1.37427039e-02,  7.26467222e-02,
          7.90741220e-02,  7.74888648e-03, -2.14855261e-02,
         -6.70931637e-02,  3.99118066e-02, -1.91834513e-02,
         -1.12698048e-01, -5.35109788e-02,  1.76604073e-02,
          6.83069788e-03, -8.07008427e-03, -8.32299814e-02,
         -2.95806453e-02, -1.82858482e-02, -6.11117855e-03,
          3.67460549e-02, -2.55634990e-02, -7.18067214e-03,
          2.92435233e-02,  4.47172066e-03,  1.34281162e-02,
          5.81531525e-02,  8.08267287e-05, -2.28525009e-02,
         -2.39055213e-02, -6.15793951e-02, -3.97902215e-03,
         -3.60448658e-02,  1.69669576e-02,  3.00527401e-02,
         -8.91205296e-03,  5.23243099e-02, -2.27126423e-02,
         -4.66264039e-02,  1.80544537e-02,  4.39945944e-02,
         -7

## Retrieval: Find relevant documents

To search the Chroma database, call the `query` method. Note that you also switch to the `retrieval_query` mode of embedding generation.


In [None]:
# Switch to query mode when generating embeddings.
embed_fn.document_mode = False

# Search the Chroma DB using the specified query.
query = "How do you use the touchscreen to play music?"

result = db.query(query_texts=[query], n_results=1)
[all_passages] = result["documents"]

Markdown(all_passages[0])

In [None]:
print(result)

## Augmented generation: Answer the question

Now that you have found a relevant passage from the set of documents (the *retrieval* step), you can now assemble a generation prompt to have the Gemini API *generate* a final answer. Note that in this example only a single passage was retrieved. In practice, especially when the size of your underlying data is large, you will want to retrieve more than one result and let the Gemini model determine what passages are relevant in answering the question. For this reason it's OK if some retrieved passages are not directly related to the question - this generation step should ignore them.

In [None]:
query_oneline = query.replace("\n", " ")

# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. 
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: {query_oneline}
"""

# Add the retrieved documents to the prompt.
for passage in all_passages:
    passage_oneline = passage.replace("\n", " ")
    prompt += f"PASSAGE: {passage_oneline}\n"

print(prompt)

Now use the `generate_content` method to to generate an answer to the question.

In [None]:
answer = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt)

Markdown(answer.text)

## Next steps

Congrats on building a Retrieval-Augmented Generation app!

To learn more about using embeddings in the Gemini API, check out the [Intro to embeddings](https://ai.google.dev/gemini-api/docs/embeddings) or to learn more fundamentals, study the [embeddings chapter](https://developers.google.com/machine-learning/crash-course/embeddings) of the Machine Learning Crash Course.

For a hosted RAG system, check out the [Semantic Retrieval service](https://ai.google.dev/gemini-api/docs/semantic_retrieval) in the Gemini API. You can implement question-answering on your own documents in a single request, or host a database for even faster responses.

*- [Mark McD](https://linktr.ee/markmcd)*