# Projekt: Market Research Agent - Phase 1

### Criteria
Prepare and process a local dataset of video game information for use in a vector database and RAG pipeline.

- The processed data is added to a persistent vector database (e.g., ChromaDB) with appropriate embeddings.
- The notebook or script demonstrates that the vector database can be queried for semantic search.

In [1]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from lib.vector_db import VectorStoreManager, GamesLoaderService

In [2]:
# Load environment variables
load_dotenv()

True

In [3]:
# Define API keys
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

### Met criteria "The processed data is added to a persistent vector database (e.g., ChromaDB) with appropriate embeddings."

In [4]:
# Initialize the vector store manager with OpenAI API key
vector_store_manager = VectorStoreManager(openai_api_key=OPENAI_API_KEY)

In [5]:
# Initialize the GamesLoaderService with the vector store manager
games_loader_service = GamesLoaderService(vector_store_manager=vector_store_manager)

In [6]:
# Load the game data from the JSON file into the vector store
vector_store = games_loader_service.load_json(store_name="game_data", json_path="games.json")

VectorStore `game_data` ready!
GameInfos from `games.json` added!


In [7]:
# Helper
def print_result(result, exepected):
  game_found = result['metadatas'][0][0]['title'] == exepected
  if game_found:
    print(f"Game found: {result['metadatas'][0][0]['title']} ✅")
  else:
    print(f"Game not found ❌, expected '{exepected}' but got '{result['metadatas'][0][0]['title']}'")    

#### Demonstrate that the vector database can be queried for semantic search

In [8]:
# Query the vector store for a specific game
query = "What is the release date of Hogwarts Legacy?"
results = vector_store.query(query_texts=[query], n_results=1)
results

{'ids': [['8']],
 'embeddings': None,
 'documents': [["Hogwarts Legacy\nPortkey Games\n2023-02-10\n['PC', 'PS5', 'Xbox Series X/S', 'PS4', 'Switch']\nAction RPG\nAn open-world action RPG set in the Harry Potter universe, 100 years before the books."]],
 'uris': None,
 'included': ['documents', 'distances', 'metadatas'],
 'data': None,
 'metadatas': [[{'genre': 'Action RPG',
    'release_date': '2023-02-10',
    'title': 'Hogwarts Legacy',
    'platforms': 'PC,PS5,Xbox Series X/S,PS4,Switch',
    'developer': 'Portkey Games'}]],
 'distances': [[0.21652083098888397]]}

In [9]:
# Check if the game is found in the results by verifying the title
print_result(results, "Hogwarts Legacy")

Game found: Hogwarts Legacy ✅


In [14]:
# Query the vector store for a game that is not in the dataset
query_not_found = "What is the release date of Mario Cart?"
results_not_found = vector_store.query(query_texts=[query_not_found], n_results=1)
results_not_found

{'ids': [['2']],
 'embeddings': None,
 'documents': [["The Legend of Zelda: Breath of the Wild\nNintendo EPD\n2017-03-03\n['Nintendo Switch', 'Wii U']\nAction-Adventure\nAn open-world Zelda game set in the kingdom of Hyrule after a great calamity."]],
 'uris': None,
 'included': ['documents', 'distances', 'metadatas'],
 'data': None,
 'metadatas': [[{'genre': 'Action-Adventure',
    'title': 'The Legend of Zelda: Breath of the Wild',
    'developer': 'Nintendo EPD',
    'release_date': '2017-03-03',
    'platforms': 'Nintendo Switch,Wii U'}]],
 'distances': [[0.39781779050827026]]}

In [17]:
# Check if the game is found in the results by verifying the title
print_result(results_not_found, "Mario Cart")

Game not found ❌, expected 'Splinter Cell' but got 'The Legend of Zelda: Breath of the Wild'
