# [Solution] Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [1]:
# Only needed for Udacity workspace

import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [2]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

In [3]:
# TODO: Create a .env file with the following variables
# OPENAI_API_KEY="YOUR_KEY"
# CHROMA_OPENAI_API_KEY="YOUR_KEY"
# TAVILY_API_KEY="YOUR_KEY"

In [4]:
# TODO: Load environment variables
load_dotenv()

True

### VectorDB Instance

In [5]:
# TODO: Instantiate your ChromaDB Client
# Choose any path you want
chroma_client = chromadb.PersistentClient(path="chromadb")

### Collection

In [6]:
# TODO: Pick one embedding function
# If picking something different than openai,
# make sure you use the same when loading it
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.getenv("OPENAI_API_KEY"),
    api_base="https://openai.vocareum.com/v1"
)

In [7]:
# TODO: Create a collection
# Choose any name you want
COLLECTION_NAME = "udaplay"
collection_created = False

try:
    collection = chroma_client.get_collection(COLLECTION_NAME)
    print(f"Loaded collection {COLLECTION_NAME}.")
except:
    collection = chroma_client.create_collection(name=COLLECTION_NAME, embedding_function=embedding_fn)
    collection_created = True
    print(f"Created collection {COLLECTION_NAME}.")

Loaded collection udaplay.


### Add documents

In [8]:
# Make sure you have a directory "project/starter/games"
if collection_created:
    data_dir = "games"

    for file_name in sorted(os.listdir(data_dir)):
        if not file_name.endswith(".json"):
            continue

        file_path = os.path.join(data_dir, file_name)
        with open(file_path, "r", encoding="utf-8") as f:
            game = json.load(f)

        # You can change what text you want to index
        content = f"""Platform: [{game['Platform']}]
    Genre: {game['Genre']}
    Name: {game['Name']}
    Release Date: ({game['YearOfRelease']})
    Publisher: {game['Publisher']}
    Description: {game['Description']}
        """

        # Use file name (like 001) as ID
        doc_id = os.path.splitext(file_name)[0]

        collection.add(
            ids=[doc_id],
            documents=[content],
            metadatas=[game]
        )
    print(f"Added documents to collection {COLLECTION_NAME}")
else:
    print(f"Collection {COLLECTION_NAME} already exists, skip adding documents.")

Collection udaplay already exists, skip adding documents.


### Execute a semantic search on the DB
Search for a game in the vector DB using a query.

In [9]:
query_texts = ["Super Mario"]

results = collection.query(query_texts=query_texts, n_results=3)
results

{'ids': [['009', '008', '012']],
 'embeddings': None,
 'documents': [["Platform: [Nintendo 64]\n    Genre: Platformer\n    Name: Super Mario 64\n    Release Date: (1996)\n    Publisher: Nintendo\n    Description: A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach.\n        ",
   'Platform: [Super Nintendo Entertainment System (SNES)]\n    Genre: Platformer\n    Name: Super Mario World\n    Release Date: (1990)\n    Publisher: Nintendo\n    Description: A classic platformer where Mario embarks on a quest to save Princess Toadstool and Dinosaur Land from Bowser.\n        ',
   'Platform: [Nintendo Switch]\n    Genre: Racing\n    Name: Mario Kart 8 Deluxe\n    Release Date: (2017)\n    Publisher: Nintendo\n    Description: An enhanced version of Mario Kart 8, featuring new characters, tracks, and improved gameplay mechanics.\n        ']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 