# VIDEO GAME Project

## Part 02 - Agent

UdaPlay, is an AI Research Agent for the video game industry. The agent is built on StateMachine[AgentState] with following steps:
1. message_prep – builds messages (system + user + history)
2. llm_processor – calls LLM.invoke(). OpenAI may respond with: normal text or tool_calls (e.g. retrieve_game, search_research_memory, etc.)
3. tool_executor – runs the tool functions. It looks at each ToolCall
- Finds the matching Tool object by name
- Calls it with parsed arguments
- Adds a ToolMessage back to messages
4. Then the agent loops back to llm_processor until no tools are requested
5. Then it goes to termination

The agent doesn’t dynamically create tools; it just chooses between these fixed tools as part of its state machine loop.

### Setup

In [2]:
# Only needed for Udacity workspace
import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [3]:
# Import libs
from pathlib import Path

from dotenv import load_dotenv
import os

import chromadb
from typing import List, Dict, Any, Optional
from pydantic import BaseModel, Field

from chromadb.utils import embedding_functions

from lib.agents import Agent
from lib.llm import LLM
from lib.messages import UserMessage, SystemMessage, ToolMessage, AIMessage
from lib.tooling import tool
from lib.documents import Document
from lib.loaders import PDFLoader
from lib.memory import SessionNotFoundError, ShortTermMemory, MemoryFragment, MemorySearchResult, LongTermMemory
from lib.parsers import OutputParser, StrOutputParser, ToolOutputParser, JsonOutputParser, PydanticOutputParser
from lib.rag import RAGState, RAG
from lib.state_machine import Resource, Step, EntryPoint, Termination, Transition, Snapshot, Run, StateMachine
from lib.vector_db import VectorStore, VectorStoreManager, CorpusLoaderService
from lib.evaluation import TaskCompletionMetrics, QualityControlMetrics, ToolInteractionMetrics, SystemMetrics, EvaluationResult, TestCase, JudgeEvaluation, AgentEvaluator


In [4]:
# Load environment variables and keys
load_dotenv(override=True)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

print("OPENAI_API_KEY loaded:", OPENAI_API_KEY is not None)
print("TAVILY_API_KEY loaded:", TAVILY_API_KEY is not None)

print("KEY PREFIX:", str(OPENAI_API_KEY)[:8])


OPENAI_API_KEY loaded: True
TAVILY_API_KEY loaded: True
KEY PREFIX: sk-proj-


### Tools

- retrieve_game: To search the vector DB
- evaluate_retrieval: To assess the retrieval performance
- game_web_search: If no good, search the web
- long_term_memory: To store and search for context, preferences and knowledge for future conversations

#### Retrieve Game Tool

In [5]:
# TODO: Create retrieve_game tool
# It should use chroma client and collection you created
# chroma_client = chromadb.PersistentClient(path="chromadb")
# collection = chroma_client.get_collection("udaplay")
# Tool Docstring:
#    Semantic search: Finds most results in the vector DB
#    args:
#    - query: a question about game industry. 
#
#    You'll receive results as list. Each element contains:
#    - Platform: like Game Boy, Playstation 5, Xbox 360...)
#    - Name: Name of the Game
#    - YearOfRelease: Year when that game was released for that platform
#    - Description: Additional details about the game

# VectorDB-Anbindung
CHROMA_PATH = "udaplay_db"

_chroma_client = chromadb.PersistentClient(path=CHROMA_PATH)
_embedding_fn = embedding_functions.DefaultEmbeddingFunction()

_collection = _chroma_client.get_collection(
    name="udaplay",
    embedding_function=_embedding_fn,
)

@tool(
    name="retrieve_game",
    description="Semantic search in the internal UdaPlay game vector database."
)
def retrieve_game(query: str, top_k: int = 5) -> List[Dict[str, Any]]:
    """
    Semantic search: Finds most relevant games in the internal Vector DB.
    Args:
        query: Question or description about games / game industry.
        top_k: Number of results to return.
    Returns:
        List of dicts with fields like Name, Platform, YearOfRelease, Description, Genre, Publisher, score.
    """
    results = _collection.query(
        query_texts=[query],
        n_results=top_k,
        include=["metadatas", "documents", "distances"],
    )

    items: List[Dict[str, Any]] = []

    metas = results.get("metadatas", [[]])[0]
    docs = results.get("documents", [[]])[0]
    dists = results.get("distances", [[]])[0]

    for meta, doc, dist in zip(metas, docs, dists):
        items.append({
            "Name": meta.get("Name"),
            "Platform": meta.get("Platform"),
            "YearOfRelease": meta.get("YearOfRelease"),
            "Description": meta.get("Description"),
            "Genre": meta.get("Genre"),
            "Publisher": meta.get("Publisher"),
            "score": float(dist),
            "raw_document": doc,
        })

    return items

### Testing and showcastong Retrieve Games tool

In [6]:
retrieve_game("realistic racing games on PlayStation 1", top_k=3)

[{'Name': 'Gran Turismo',
  'Platform': 'PlayStation 1',
  'YearOfRelease': 1997,
  'Description': 'A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.',
  'Genre': 'Racing',
  'Publisher': 'Sony Computer Entertainment',
  'score': 0.7438501715660095,
  'raw_document': '[PlayStation 1] Gran Turismo (1997) - A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.'},
 {'Name': 'Gran Turismo 5',
  'Platform': 'PlayStation 3',
  'YearOfRelease': 2010,
  'Description': 'A comprehensive racing simulator featuring a vast selection of vehicles and tracks, with realistic driving physics.',
  'Genre': 'Racing',
  'Publisher': 'Sony Computer Entertainment',
  'score': 0.8049542307853699,
  'raw_document': '[PlayStation 3] Gran Turismo 5 (2010) - A comprehensive racing simulator featuring a vast selection of vehicles and tracks, with realistic driving physics.'},
 {'Name': 'Grand Theft A

#### Evaluate Retrieval Tool

In [7]:
# TODO: Create evaluate_retrieval tool
# You might use an LLM as judge in this tool to evaluate the performance
# You need to prompt that LLM with something like:
# "Your task is to evaluate if the documents are enough to respond the query. "
# "Give a detailed explanation, so it's possible to take an action to accept it or not."
# Use EvaluationReport to parse the result
# Tool Docstring:
#    Based on the user's question and on the list of retrieved documents, 
#    it will analyze the usability of the documents to respond to that question. 
#    args: 
#    - question: original question from user
#    - retrieved_docs: retrieved documents most similar to the user query in the Vector Database
#    The result includes:
#    - useful: whether the documents are useful to answer the question
#    - description: description about the evaluation result

class EvaluationReport(BaseModel):
    """Evaluation of whether retrieved docs are sufficient to answer a question."""
    useful: bool = Field(
        description="Whether the retrieved documents are sufficient and relevant to answer the question."
    )
    description: str = Field(
        description="Detailed reasoning why the docs are or are not useful, with suggestions what might be missing."
    )


@tool(
    name="evaluate_retrieval",
    description=(
        "Based on the user's question and the list of retrieved documents, "
        "analyze if the documents are sufficient and useful to answer the question."
    ),
)
def evaluate_retrieval(question: str, retrieved_docs: Optional[List[str]] = None,) -> Dict[str, Any]:
    """
    Uses an LLM as judge to decide if the retrieved docs are good enough.
    Args:
        question: original user question
        retrieved_docs: list of documents returned by retrieve_game
    Returns:
        dict with keys: useful (bool), description (str)
    """
    # Baue einen verständlichen Prompt
    judge_prompt = f"""
You are an expert evaluator for information retrieval quality.

Your task is to evaluate if the retrieved documents are enough to properly answer the user question.

USER QUESTION:
{question}

RETRIEVED DOCUMENTS (JSON list):
{retrieved_docs}

Instructions:
- Analyze if these documents are relevant and sufficient to answer the question.
- If they are sufficient, set useful = true.
- If they are missing important information, or irrelevant, set useful = false.
- In description, explain briefly why, and suggest what is missing if anything.

Return JSON matching this schema:
- useful: boolean
- description: string
"""

    llm = LLM(model="gpt-4o-mini", temperature=0.0, api_key=os.getenv("OPENAI_API_KEY"))
    ai_msg = llm.invoke(judge_prompt, response_format=EvaluationReport)

    report = EvaluationReport.model_validate_json(ai_msg.content)
    return report.model_dump()

### Testing and showcasting Evaluate Retrieval tool

In [8]:
docs = retrieve_game("realistic racing games on PlayStation 1", top_k=3)
evaluate_retrieval("Tell me about realistic racing games on PlayStation 1.", docs)

{'useful': False,
 'description': "The retrieved documents do not provide sufficient information to answer the user question about realistic racing games on PlayStation 1. Only one relevant game, 'Gran Turismo', is mentioned, which is indeed a realistic racing simulator for the PlayStation 1. However, the other two documents refer to games on different platforms (PlayStation 2 and PlayStation 3) and are not relevant to the user's request. Additionally, there is no mention of other potential racing games on the PlayStation 1, such as 'Ridge Racer' or 'Need for Speed'. To improve the response, more documents specifically about other realistic racing games on PlayStation 1 should be included."}

#### Game Web Search Tool

In [9]:
# TODO: Create game_web_search tool
# Please use Tavily client to search the web
# Tool Docstring:
#    Semantic search: Finds most results in the vector DB
#    args:
#    - question: a question about game industry. 

from tavily import TavilyClient

_tavily = TavilyClient(api_key=TAVILY_API_KEY) if TAVILY_API_KEY else None

@tool(
    name="game_web_search",
    description="Web search about the video game industry, platforms, releases, and related facts."
)
def game_web_search(question: str) -> Dict[str, Any]:
    """
    Uses Tavily to search the web about the game industry when internal DB is not enough.
    Args:
        question: a question about the game industry.
    Returns:
        dict with raw Tavily response (summary + sources if available).
    """
    if _tavily is None:
        return {
            "summary": "Web search client is not configured (missing TAVILY_API_KEY).",
            "results": []
        }

    resp = _tavily.search(
        query=question,
        search_depth="advanced",
        max_results=5,
    )

    # Tavily gibt normalerweise 'results' mit 'title', 'url', 'content' zurück
    return resp


### Testing and showcasting Game Web Search tool

In [12]:
game_web_search("When were Pokémon Gold and Silver released?")


{'query': 'When were Pokémon Gold and Silver released?',
 'follow_up_questions': None,
 'answer': None,
 'images': [],
 'results': [{'url': 'https://www.pokemon.com/us/pokemon-video-games/pokemon-gold-version-and-pokemon-silver-version',
   'title': 'Pokémon Gold Version and Pokémon Silver Version - Pokemon.com',
   'content': 'Pokémon Gold and Pokémon Silver were released in Japan on November 21, 1999, as the second set of titles in the Pokémon series. These games, which were the first titles in the Pokémon series to be designed for the Game Boy Color, will be recreated in their Virtual Console versions so their screens appear just as they did on the Game Boy Color. Players will be able to enjoy playing them just as they remember playing them in earlier days. [...] |  |  |\n --- |\n| Release Date: | October 15, 2000 |\n| Genre: | RPG |\n| Platform: | Game Boy Color |\n| Players: | 1 |\n|  |\n\nView More Games\n\n|  |  |\n --- |\n| Release Date: | October 15, 2000 |\n| Genre: | RPG |\n

### Long-term Memory 

In [20]:
import time

from lib.memory import SessionNotFoundError, ShortTermMemory, MemoryFragment, MemorySearchResult, LongTermMemory, TimestampFilter

# Create a VectorStoreManager for memory
memory_vector_manager = VectorStoreManager(openai_api_key=OPENAI_API_KEY)

# Create a LongTermMemory store (this will create/use a Chroma collection "long_term_memory")
long_term_memory = LongTermMemory(db=memory_vector_manager)


#Tool which the agent can use to  write something worth remembering into long-term memory
@tool(
    name="store_research_insight",
    description="Store an important user- or research-related insight in long-term memory."
)
def store_research_insight(
    content: str,
    owner: str = "default_user",
    namespace: str = "research",
    tag: str | None = None,
) -> dict:
    """
    Store a research insight or important fact in the long-term memory vector store.

    args:
        content: The text you want the system to remember.
        owner: Identifier for the user (or session).
        namespace: Logical group for memories (e.g. 'research', 'preferences').
        tag: Optional tag to categorize the memory (e.g. 'racing_games', 'Nintendo').
    """
    fragment = MemoryFragment(
        content=content,
        owner=owner,
        namespace=namespace,
    )

    extra_meta = {}
    if tag:
        extra_meta["tag"] = tag

    long_term_memory.register(fragment, metadata=extra_meta)

    return {
        "status": "stored",
        "owner": owner,
        "namespace": namespace,
        "tag": tag,
        "timestamp": fragment.timestamp,
    }


#Tool which the agent can use to search for stored memory
@tool(
    name="search_research_memory",
    description="Search long-term memory for relevant stored insights."
)
def search_research_memory(
    query: str,
    owner: str = "default_user",
    namespace: str = "research",
    limit: int = 3,
    recent_seconds: int | None = None,
) -> dict:
    """
    Search long-term memory for relevant stored insights.

    args:
        query: Natural language query describing what to retrieve.
        owner: Identifier for the user (or session).
        namespace: Memory namespace (e.g. 'research', 'preferences').
        limit: Maximum number of memory fragments to return.
        recent_seconds: If provided, only return memories newer than now - recent_seconds.
    """
    ts_filter = None
    if recent_seconds is not None:
        now_ts = int(time.time())
        ts_filter = TimestampFilter(
            greater_than_value=now_ts - recent_seconds
        )

    result: MemorySearchResult = long_term_memory.search(
        query_text=query,
        owner=owner,
        limit=limit,
        timestamp_filter=ts_filter,
        namespace=namespace,
    )

    fragments_out = [
        {
            "content": f.content,
            "owner": f.owner,
            "namespace": f.namespace,
            "timestamp": f.timestamp,
        }
        for f in result.fragments
    ]

    return {
        "fragments": fragments_out,
        "distances": result.metadata.get("distances", []),
    }

### Agent

In [21]:
# TODO: Create your Agent abstraction using StateMachine
# Equip with an appropriate model
# Craft a good set of instructions 
# Plug all Tools you developed

from lib.agents import Agent

UDAPLAY_INSTRUCTIONS = """
You are UdaPlay, an AI Research Agent for the video game industry.

Your goals:
- Answer questions about video games, platforms, genres, publishers, and history.
- Prefer internal knowledge (VectorDB) when possible.
- Fall back to web search when internal knowledge is not sufficient.
- Explain your reasoning clearly in natural language to the user.
- Store important insights and user preferences in long-term memory
  so that future answers can be more personalized and consistent.
- Retrieve relevant memories when they may help answer the question.

You have access to five tools:

1) retrieve_game
   - Use this FIRST for questions about specific games, platforms, release years, genres, etc.
   - It performs semantic search over an internal vector database of games.

2) evaluate_retrieval
   - After using retrieve_game, use this tool to evaluate whether the retrieved documents
     are good enough to answer the user's question.
   - It returns:
       - useful: true/false
       - description: explanation of the evaluation
   - If useful == true:
       - Use the retrieved documents to answer the question.
   - If useful == false:
       - Consider calling game_web_search.

3) game_web_search
   - Use this when:
       - evaluate_retrieval says the internal docs are not useful enough, OR
       - the question is clearly about very recent events, news, or things not in the internal DB.
   - It searches the web for relevant information about games and the game industry.
   - When you rely on web_search results, mention the most important sources and include their URLs where appropriate.

4) store_research_insight
    - On questions where the user expresses a strong preference or shares an important fact ("I love Fifa 2000", "I'm researching game history"),
    - Consider calling store_research_insight with a short summary of that preference or fact

5) search_research_memory
    - When a new query might benefit from previous context, consider calling search_research_memory to recall past insights

General behavior:
- Always try to use tools when they are helpful.
- Do not expose raw tool schemas or internal JSON structures to the user.
- Summarize tool results into a clear, helpful answer.
- If you do not know the answer, be honest and say so.
"""

udaplay_agent = Agent(
    model_name="gpt-4o-mini",   # or another OpenAI chat model you use
    instructions=UDAPLAY_INSTRUCTIONS,
    tools=[retrieve_game, evaluate_retrieval, game_web_search, store_research_insight, search_research_memory],
    temperature=0.3,
)


### Agent-Run-Conversation

In [22]:
# Function for prettification of the agent conversation
def print_agent_run(run):
    """Pretty-print a run: shows user messages, tool calls, tool results, and final answer."""
    final_state = run.get_final_state()
    messages = final_state["messages"]

    print("=== Conversation ===")
    for m in messages:
        # User
        if m.role == "user":
            print("\nUSER:", m.content)

        # Assistant normal reply (no tool call)
        elif m.role == "assistant" and not m.tool_calls:
            print("\nASSISTANT:", m.content)

        # Assistant requesting tools
        elif m.role == "assistant" and m.tool_calls:
            print("\nASSISTANT (tool calls):")
            for tc in m.tool_calls:
                print(f"  -> {tc.function.name}({tc.function.arguments})")

        # Tool messages
        elif m.role == "tool":
            print(f"\nTOOL [{m.name}] RESULT:")
            # Inhalt ist JSON-String, deshalb etwas verkürzt ausgeben
            print(m.content[:500], "..." if len(m.content) > 500 else "")

    # Highlight last assistant answer
    final_answers = [m for m in messages if m.role == "assistant" and m.content and not m.tool_calls]
    if final_answers:
        print("\n=== FINAL ANSWER ===")
        print(final_answers[-1].content)

#### Demo user input and output 

In [16]:
questions = [
    "When were Pokémon Gold and Silver released?",
    "Which one was the first 3D platformer Mario game?",
    "Was Mortal Kombat X released for PlayStation 5?"
]

for q in questions:
    print("\n==========================")
    print("USER QUESTION:", q)
    print("==========================")

    run = udaplay_agent.invoke(q, session_id="qa_test")
    print_agent_run(run)
    print("\n" + "=" * 60 + "\n")



USER QUESTION: When were Pokémon Gold and Silver released?
[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
=== Conversation ===

USER: When were Pokémon Gold and Silver released?

ASSISTANT (tool calls):
  -> retrieve_game({"query":"Pokémon Gold and Silver release date"})

TOOL [retrieve_game] RESULT:
"[{'Name': 'Pok\u00e9mon Gold and Silver', 'Platform': 'Game Boy Color', 'YearOfRelease': 1999, 'Description': 'Second-generation Pok\u00e9mon games introducing new regions, Pok\u00e9mon, and gameplay mechanics.', 'Genre': 'Role-playing', 'Publisher': 'Nintendo', 'score': 0.6876085996627808, 'raw_document

#### Multi-turn conversation demo with same session_id

In [17]:
session_id = "dialog_demo"

print("\n########## MULTI-TURN DIALOG DEMO ##########\n")

# Phase 1: Start conversation 
run1 = udaplay_agent.invoke(
    "Tell me about Gran Turismo.",
    session_id=session_id,
)
print("\n--- TURN 1 ---")
print_agent_run(run1)

# Phase 2: Follow-up-question
run2 = udaplay_agent.invoke(
    "On which platform was it released?",
    session_id=session_id,
)
print("\n--- TURN 2 ---")
print_agent_run(run2)

# Phase 3: Follow-up-question
run3 = udaplay_agent.invoke(
    "Was it considered realistic for its time?",
    session_id=session_id,
)
print("\n--- TURN 3 ---")
print_agent_run(run3)



########## MULTI-TURN DIALOG DEMO ##########

[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__

--- TURN 1 ---
=== Conversation ===

USER: Tell me about Gran Turismo.

ASSISTANT (tool calls):
  -> retrieve_game({"query":"Gran Turismo"})

TOOL [retrieve_game] RESULT:
"[{'Name': 'Gran Turismo', 'Platform': 'PlayStation 1', 'YearOfRelease': 1997, 'Description': 'A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.', 'Genre': 'Racing', 'Publisher': 'Sony Computer Entertainment', 'score': 0.8414762020111084, 'raw_document': '[PlayStation 1] Gran Turism