<a href="https://colab.research.google.com/github/wjleece/AI-Agents/blob/main/AI_Agents_w_Evals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install anthropic
%pip install openai
%pip install -q -U google-generativeai
%pip install fuzzywuzzy

Collecting anthropic
  Downloading anthropic-0.51.0-py3-none-any.whl.metadata (25 kB)
Downloading anthropic-0.51.0-py3-none-any.whl (263 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m264.0/264.0 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: anthropic
Successfully installed anthropic-0.51.0
Collecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl.metadata (4.9 kB)
Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0


In [5]:
#Setup and Imports
import anthropic
import google.generativeai as gemini
import re
import json
import time
import os
import glob # For finding files matching a pattern
import uuid # For generating unique learning IDs in RAG
from google.colab import userdata
from openai import OpenAI
from google.colab import drive # For Google Drive mounting
from datetime import datetime
from typing import Dict, List, Any, Optional, Union, Tuple
from fuzzywuzzy import process, fuzz

# LLM API Keys (keep as is)
ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

anthropic_client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
openai_client = OpenAI(api_key=OPENAI_API_KEY)
gemini.configure(api_key=GOOGLE_API_KEY)

ANTHROPIC_MODEL_NAME = "claude-3-5-sonnet-latest"
OPENAI_MODEL_NAME = "gpt-4.1" # Or your preferred GPT-4 class model
EVAL_MODEL_NAME = "gemini-2.5-pro-preview-05-06" # Or your preferred Gemini model

# --- NEW: Configuration for Google Drive RAG Store ---
# User will be prompted to set this path if not found, or can set it here.
# It's the path *after* /content/drive/
DEFAULT_LEARNINGS_DRIVE_SUBPATH = "My Drive/AI/Knowledgebases" # My path - yours may differ
LEARNINGS_DRIVE_BASE_PATH = "" # Will be set dynamically or from default

DRIVE_MOUNT_PATH = '/content/drive'

print("Imports and LLM clients initialized. Drive RAG configuration variables set.")

Imports and LLM clients initialized. Drive RAG configuration variables set.


In [6]:
worker_system_prompt = """
You are a helpful customer service assistant for an e-commerce system.

When responding to the user, use the conversation context to maintain continuity.
- If a user refers to "my order" or similar, use the context to determine which order they're talking about.
- If they mention "that product" or use other references, check the context to determine what they're referring to.
- Always prioritize recent context over older context when resolving references.

The conversation context will be provided to you with each message. This includes:
- Previous questions and answers
- Recently viewed customers, products, and orders
- Recent actions taken (like creating orders, updating products, etc.)

BEHAVIOR FOR TARGETED REQUESTS:
If the user's query explicitly names the *other* AI assistant for a task (e.g., "OpenAI, do X" when you are Anthropic, or "Anthropic, do Y" when you are OpenAI), you MUST follow these steps:
1. Identify that the request is specifically for the other assistant.
2. Your *only* action should be to output a brief, polite acknowledgment. For example:
   - "Understood. I'll let OpenAI handle that."
   - "Okay, that request is for Anthropic."
   - Or simply: "Acknowledged."
3. You MUST NOT call any tools or attempt to perform the core task mentioned in the user's query. Your role in this specific instance is to defer.
Failure to defer when the other agent is explicitly named for a task will be considered incorrect behavior.

Keep responses friendly, concise, and helpful. If you're not sure what a user is referring to, ask for clarification.
"""

evaluator_system_prompt = """
You are an impartial evaluator assessing the quality of responses from two AI assistants (Anthropic Claude and OpenAI GPT) to customer service queries.

For each interaction, evaluate both responses based on:
1. Accuracy: How correct and factual is the response based on the available information?
2. Efficiency: Did the assistant get to the correct answer with minimal clarifying questions?
3. Context Awareness: Did the assistant correctly use the conversation context to understand references?
4. Helpfulness: How well did the assistant address the user's needs?

Score each response on a scale of 1-10 for each criterion, and provide an overall score.

If you identify ambiguity in the user's query that neither assistant could reasonably resolve without additional information:
1. ALWAYS begin your clarification request with the exact phrase "CLARIFICATION NEEDED:" followed by a specific question
2. Format your request clearly and precisely as "CLARIFICATION NEEDED: [your specific question here]"
3. Make your question answerable with a straightforward response
4. If multiple clarifications are needed, number them clearly

After receiving human clarification, continue your evaluation incorporating this new information.
Store this feedback as a "learning" so similar situations can be handled better in the future.

If multiple data stores are provided representing the state after each assistant's actions, you will be asked to compare them for consistency as a final step and comment on whether this comparison affects your initial scoring.

For testing purposes, you may be asked to identify which model you are. You should realize that type of question likely comes from
a human user and not from an AI assistant. Therefore you should properly identify yourself by stating which model you are, and,
if specifically asked, your key tasks.
"""

In [7]:
# The GenerativeModel instance for evaluation will be created with the system instruction.
eval_model_instance = gemini.GenerativeModel(
    model_name=EVAL_MODEL_NAME,
    system_instruction=evaluator_system_prompt
)

In [8]:
# Cell 5: Global Data Stores (Initial data - will be managed by the Storage class instance)
# These are initial values. The Storage class will manage them.
initial_customers = {
    "C1": {"name": "John Doe", "email": "john@example.com", "phone": "123-456-7890"},
    "C2": {"name": "Jane Smith", "email": "jane@example.com", "phone": "987-654-3210"}
}

initial_products = {
    "P1": {"name": "Widget A", "description": "A simple widget. Very compact.", "price": 19.99, "inventory_count": 999},
    "P2": {"name": "Gadget B", "description": "A powerful gadget. It spins.", "price": 49.99, "inventory_count": 200},
    "P3": {"name": "Perplexinator", "description": "A perplexing perfunctator", "price": 79.99, "inventory_count": 1483}
}

initial_orders = {
    "O1": {"id": "O1", "product_id": "P1", "product_name": "Widget A", "quantity": 2, "price": 19.99, "status": "Shipped"},
    "O2": {"id": "O2", "product_id": "P2", "product_name": "Gadget B", "quantity": 1, "price": 49.99, "status": "Processing"}
}


In [9]:
#Knowledge base and Global Tools Placeholder
human_feedback_learnings = {}
tools_schemas_list = []

In [7]:
# Standalone Anthropic Completion Function (for basic tests)
def get_completion_anthropic_standalone(prompt: str):
    message = anthropic_client.messages.create(
        model=ANTHROPIC_MODEL_NAME,
        max_tokens=2000,
        temperature=0.0,
        system=worker_system_prompt,
        tools=tools_schemas_list,
        messages=[
          {"role": "user", "content": prompt}
        ]
    )
    return message.content[0].text

In [8]:
prompt_test_anthropic = "Hey there, which AI model do you use for answering questions?"
print(f"Anthropic Standalone Test: {get_completion_anthropic_standalone(prompt_test_anthropic)}")

Anthropic Standalone Test: I am Claude, created by Anthropic. I aim to be direct and honest about my identity while focusing on providing helpful customer service assistance for the e-commerce system.


In [9]:
def get_completion_openai_standalone(prompt: str):
    response = openai_client.chat.completions.create(
        model=OPENAI_MODEL_NAME,
        max_tokens=2000,
        temperature=0.0,
        tools=tools_schemas_list,
        messages=[
            {"role": "system", "content": worker_system_prompt},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content

In [10]:
prompt_test_openai = "Hey there, which AI model do you use for answering questions?"
print(f"OpenAI Standalone Test: {get_completion_openai_standalone(prompt_test_openai)}")

OpenAI Standalone Test: Hello! I’m powered by Anthropic’s AI technology to assist you with your questions and help you with your e-commerce needs. If you have any specific questions or need help with your orders or products, just let me know!


In [11]:
def get_completion_eval_standalone(prompt: str):
    # Uses the eval_model_instance defined in Cell 4 which has the system prompt
        response = eval_model_instance.generate_content(prompt)
        return response.text

In [12]:
prompt_test_eval = "Hey there, can you tell me which AI you are and what your key tasks are?"
print(f"Gemini Eval Standalone Test:\n{get_completion_eval_standalone(prompt_test_eval)}")

Gemini Eval Standalone Test:
I am an AI model, and my key tasks are to act as an impartial evaluator, assessing the quality of responses from AI assistants to customer service queries based on accuracy, efficiency, context awareness, and helpfulness.


In [10]:
# Storage Class Definition
class Storage:
    """Storage class for global e-commerce data access"""
    def __init__(self):
        # Each Storage instance gets its own copy of the initial data
        self.customers = initial_customers.copy()
        self.products = initial_products.copy()
        self.orders = initial_orders.copy()
        # Note: human_feedback_learnings is still a shared global dictionary
        self.human_feedback_learnings = human_feedback_learnings

# This global instance is for legacy/standalone tool testing if any.
# The DualAgentEvaluator will create its own instances for Anthropic and OpenAI.
storage_global_for_standalone_tests = Storage()
print("Storage class defined. Note: DualAgentEvaluator will use its own Storage instances.")

Storage class defined. Note: DualAgentEvaluator will use its own Storage instances.


In [11]:
#Definitive list of tool schemas.
tools_schemas_list = [
    {
        "name": "create_customer",
        "description": "Adds a new customer to the database. Includes customer name, email, and (optional) phone number.",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "The name of the customer."},
                "email": {"type": "string", "description": "The email address of the customer."},
                "phone": {"type": "string", "description": "The phone number of the customer (optional)."}
            },
            "required": ["name", "email"]
        }
    },
    {
        "name": "get_customer_info",
        "description": "Retrieves customer information based on their customer ID. Returns the customer's name, email, and (optional) phone number.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string", "description": "The unique identifier for the customer."}
            },
            "required": ["customer_id"]
        }
    },
    {
        "name": "create_product",
        "description": "Adds a new product to the product database. Includes name, description, price, and initial inventory count.",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "The name of the product."},
                "description": {"type": "string", "description": "A description of the product."},
                "price": {"type": "number", "description": "The price of the product."},
                "inventory_count": {"type": "integer", "description": "The amount of the product that is currently in inventory."}
            },
            "required": ["name", "description", "price", "inventory_count"]
        }
    },
    {
        "name": "update_product",
        "description": "Updates an existing product with new information. Only fields that are provided will be updated; other fields remain unchanged.",
        "input_schema": {
            "type": "object",
            "properties": {
                "product_id": {"type": "string", "description": "The unique identifier for the product to update."},
                "name": {"type": "string", "description": "The new name for the product (optional)."},
                "description": {"type": "string", "description": "The new description for the product (optional)."},
                "price": {"type": "number", "description": "The new price for the product (optional)."},
                "inventory_count": {"type": "integer", "description": "The new inventory count for the product (optional)."}
            },
            "required": ["product_id"]
        }
    },
    {
        "name": "get_product_info",
        "description": "Retrieves product information based on product ID or product name (with fuzzy matching for misspellings). Returns product details including name, description, price, and inventory count.",
        "input_schema": {
            "type": "object",
            "properties": {
                "product_id_or_name": {"type": "string", "description": "The product ID or name (can be approximate)."}
            },
            "required": ["product_id_or_name"]
        }
    },
    {
        "name": "list_all_products",
        "description": "Lists all available products in the inventory.",
        "input_schema": { "type": "object", "properties": {}, "required": [] }
    },
    {
        "name": "create_order",
        "description": "Creates an order using the product's current price. If requested quantity exceeds available inventory, no order is created and available quantity is returned. Orders can only be created for products that are in stock. Supports specifying products by either ID or name with fuzzy matching for misspellings.",
        "input_schema": {
            "type": "object",
            "properties": {
                "product_id_or_name": {"type": "string", "description": "The ID or name of the product to order (supports fuzzy matching)."},
                "quantity": {"type": "integer", "description": "The quantity of the product in the order."},
                "status": {"type": "string", "description": "The initial status of the order (e.g., 'Processing', 'Shipped')."}
            },
            "required": ["product_id_or_name", "quantity", "status"]
        }
    },
    {
        "name": "get_order_details",
        "description": "Retrieves the details of a specific order based on the order ID. Returns the order ID, product name, quantity, price, and order status.",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string", "description": "The unique identifier for the order."}
            },
            "required": ["order_id"]
        }
    },
    {
        "name": "update_order_status",
        "description": "Updates the status of an order and adjusts inventory accordingly. Changing to \"Shipped\" decreases inventory. Changing to \"Returned\" or \"Canceled\" from \"Shipped\" increases inventory. Status can be \"Processing\", \"Shipped\", \"Delivered\", \"Returned\", or \"Canceled\".",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string", "description": "The unique identifier for the order."},
                "new_status": {
                    "type": "string",
                    "description": "The new status to set for the order.",
                    "enum": ["Processing", "Shipped", "Delivered", "Returned", "Canceled"]
                }
            },
            "required": ["order_id", "new_status"]
        }
    }
]
print(f"Defined {len(tools_schemas_list)} tool schemas.")


Defined 9 tool schemas.


In [12]:
# Tool Function Definitions
# These tool functions now accept a 'current_storage' argument to operate on a specific Storage instance.

# Customer functions
def create_customer(current_storage: Storage, name: str, email: str, phone: Optional[str] = None) -> Dict[str, Any]:
    """Creates a new customer and adds them to the customer database."""
    new_id = f"C{len(current_storage.customers) + 1}"
    current_storage.customers[new_id] = {"name": name, "email": email, "phone": phone}
    print(f"[Tool Executed] create_customer: ID {new_id}, Name: {name} (in {type(current_storage).__name__})")
    return {"status": "success", "customer_id": new_id, "customer": current_storage.customers[new_id]}

def get_customer_info(current_storage: Storage, customer_id: str) -> Dict[str, Any]:
    """Retrieves information about a customer based on their ID."""
    customer = current_storage.customers.get(customer_id)
    if customer:
        print(f"[Tool Executed] get_customer_info: ID {customer_id} found (in {type(current_storage).__name__}).")
        return {"status": "success", "customer_id": customer_id, "customer": customer}
    print(f"[Tool Executed] get_customer_info: ID {customer_id} not found (in {type(current_storage).__name__}).")
    return {"status": "error", "message": "Customer not found"}

# Product functions
def create_product(current_storage: Storage, name: str, description: str, price: float, inventory_count: int) -> Dict[str, Any]:
    """Creates a new product and adds it to the product database."""
    new_id = f"P{len(current_storage.products) + 1}"
    current_storage.products[new_id] = {
        "name": name,
        "description": description,
        "price": float(price),
        "inventory_count": int(inventory_count)
    }
    print(f"[Tool Executed] create_product: ID {new_id}, Name: {name} (in {type(current_storage).__name__})")
    return {"status": "success", "product_id": new_id, "product": current_storage.products[new_id]}

def update_product(current_storage: Storage, product_id: str, name: Optional[str] = None, description: Optional[str] = None,
                   price: Optional[float] = None, inventory_count: Optional[int] = None) -> Dict[str, Any]:
    """Updates a product with the provided parameters."""
    if product_id not in current_storage.products:
        print(f"[Tool Executed] update_product: ID {product_id} not found (in {type(current_storage).__name__}).")
        return {"status": "error", "message": f"Product {product_id} not found"}

    product = current_storage.products[product_id]
    updated_fields = []

    if name is not None:
        product["name"] = name
        updated_fields.append("name")
    if description is not None:
        product["description"] = description
        updated_fields.append("description")
    if price is not None:
        product["price"] = float(price)
        updated_fields.append("price")
    if inventory_count is not None:
        product["inventory_count"] = int(inventory_count)
        updated_fields.append("inventory_count")

    if not updated_fields:
        print(f"[Tool Executed] update_product: ID {product_id}, no fields updated (in {type(current_storage).__name__}).")
        return {"status": "warning", "message": "No fields were updated.", "product": product}

    print(f"[Tool Executed] update_product: ID {product_id}, Updated fields: {', '.join(updated_fields)} (in {type(current_storage).__name__})")
    return {
        "status": "success",
        "message": f"Product {product_id} updated. Fields: {', '.join(updated_fields)}",
        "product_id": product_id,
        "updated_fields": updated_fields,
        "product": product
    }

def find_product_by_name(current_storage: Storage, product_name: str, min_similarity: int = 70) -> Tuple[Optional[str], Optional[Dict[str, Any]]]:
    """Find a product by name using fuzzy string matching."""
    if not product_name: return None, None

    name_id_list = [(p_data["name"], p_id) for p_id, p_data in current_storage.products.items()]
    if not name_id_list: return None, None

    best_match_name_score = process.extractOne(
        product_name,
        [item[0] for item in name_id_list],
        scorer=fuzz.token_sort_ratio
    )

    if best_match_name_score and best_match_name_score[1] >= min_similarity:
        matched_name = best_match_name_score[0]
        for name, pid_val in name_id_list:
            if name == matched_name:
                print(f"[Tool Helper] find_product_by_name: Matched '{product_name}' to '{matched_name}' (ID: {pid_val}) with score {best_match_name_score[1]} (in {type(current_storage).__name__})")
                return pid_val, current_storage.products[pid_val]

    print(f"[Tool Helper] find_product_by_name: No good match for '{product_name}' (min_similarity: {min_similarity}, Best match: {best_match_name_score}) (in {type(current_storage).__name__})")
    return None, None


def get_product_id(current_storage: Storage, product_identifier: str) -> Optional[str]:
    """Get product ID either directly or by fuzzy matching the name."""
    if product_identifier in current_storage.products:
        return product_identifier
    product_id, _ = find_product_by_name(current_storage, product_identifier)
    return product_id

def get_product_info(current_storage: Storage, product_id_or_name: str) -> Dict[str, Any]:
    """Get information about a product by its ID or name."""
    if product_id_or_name in current_storage.products:
        product = current_storage.products[product_id_or_name]
        print(f"[Tool Executed] get_product_info: Found by ID '{product_id_or_name}' (in {type(current_storage).__name__}).")
        return {"status": "success", "product_id": product_id_or_name, "product": product}

    # Use the modified find_product_by_name that takes current_storage
    product_id_found, product_data = find_product_by_name(current_storage, product_id_or_name)
    if product_id_found and product_data:
        print(f"[Tool Executed] get_product_info: Found by name (fuzzy) '{product_id_or_name}' as ID '{product_id_found}' (in {type(current_storage).__name__}).")
        return {"status": "success", "message": f"Found product matching '{product_id_or_name}'", "product_id": product_id_found, "product": product_data}

    print(f"[Tool Executed] get_product_info: No product found for '{product_id_or_name}' (in {type(current_storage).__name__}).")
    return {"status": "error", "message": f"No product found matching '{product_id_or_name}'"}


def list_all_products(current_storage: Storage) -> Dict[str, Any]:
    """List all available products in the inventory."""
    print(f"[Tool Executed] list_all_products: Found {len(current_storage.products)} products (in {type(current_storage).__name__}).")
    return {"status": "success", "count": len(current_storage.products), "products": dict(current_storage.products)}

# Order functions
def create_order(current_storage: Storage, product_id_or_name: str, quantity: int, status: str) -> Dict[str, Any]:
    """Creates an order using the product's stored price."""
    actual_product_id = get_product_id(current_storage, product_id_or_name) # Pass current_storage

    if not actual_product_id:
        print(f"[Tool Executed] create_order: Product '{product_id_or_name}' not found (in {type(current_storage).__name__}).")
        return {"status": "error", "message": f"Product '{product_id_or_name}' not found."}

    product = current_storage.products[actual_product_id]
    price = product["price"]

    if product["inventory_count"] == 0:
        print(f"[Tool Executed] create_order: Product ID {actual_product_id} is out of stock (in {type(current_storage).__name__}).")
        return {"status": "error", "message": f"{product['name']} is out of stock."}
    if quantity <= 0:
        print(f"[Tool Executed] create_order: Quantity must be positive. Requested: {quantity} (in {type(current_storage).__name__})")
        return {"status": "error", "message": "Quantity must be a positive number."}
    if quantity > product["inventory_count"]:
        print(f"[Tool Executed] create_order: Insufficient inventory for {product['name']} (ID: {actual_product_id}). Available: {product['inventory_count']}, Requested: {quantity} (in {type(current_storage).__name__})")
        return {
            "status": "partial_availability",
            "message": f"Insufficient inventory. Only {product['inventory_count']} units of {product['name']} are available.",
            "available_quantity": product["inventory_count"],
            "requested_quantity": quantity,
            "product_name": product['name']
        }

    if status == "Shipped":
        product["inventory_count"] -= quantity
        print(f"[Tool Executed] create_order: Inventory for {product['name']} (ID: {actual_product_id}) reduced by {quantity} due to 'Shipped' status on creation (in {type(current_storage).__name__}).")

    new_id = f"O{len(current_storage.orders) + 1}"
    current_storage.orders[new_id] = {
        "id": new_id,
        "product_id": actual_product_id,
        "product_name": product["name"],
        "quantity": quantity,
        "price": price,
        "status": status
    }
    print(f"[Tool Executed] create_order: Order {new_id} created for {quantity} of {product['name']} (ID: {actual_product_id}). Status: {status}. Remaining inv: {product['inventory_count']} (in {type(current_storage).__name__})")
    return {
        "status": "success",
        "order_id": new_id,
        "order_details": current_storage.orders[new_id],
        "remaining_inventory": product["inventory_count"]
    }

def get_order_details(current_storage: Storage, order_id: str) -> Dict[str, Any]:
    """Get details of a specific order."""
    order = current_storage.orders.get(order_id)
    if order:
        print(f"[Tool Executed] get_order_details: Order {order_id} found (in {type(current_storage).__name__}).")
        return {"status": "success", "order_id": order_id, "order_details": dict(order)}
    print(f"[Tool Executed] get_order_details: Order {order_id} not found (in {type(current_storage).__name__}).")
    return {"status": "error", "message": "Order not found"}

def update_order_status(current_storage: Storage, order_id: str, new_status: str) -> Dict[str, Any]:
    """Updates the status of an order and adjusts inventory accordingly."""
    if order_id not in current_storage.orders:
        print(f"[Tool Executed] update_order_status: Order {order_id} not found (in {type(current_storage).__name__}).")
        return {"status": "error", "message": "Order not found"}

    order = current_storage.orders[order_id]
    old_status = order["status"]
    product_id = order["product_id"]
    quantity = order["quantity"]

    if old_status == new_status:
        print(f"[Tool Executed] update_order_status: Order {order_id} status unchanged ({old_status}) (in {type(current_storage).__name__}).")
        return {"status": "unchanged", "message": f"Order {order_id} status is already {old_status}", "order_details": dict(order)}

    inventory_adjusted = False
    current_inventory_val = "unknown" # Default if product not found (should not happen if order is valid)

    if product_id in current_storage.products:
        product = current_storage.products[product_id]
        current_inventory_val = product["inventory_count"]

        if new_status == "Shipped" and old_status not in ["Shipped", "Delivered"]:
            if current_inventory_val < quantity:
                print(f"[Tool Executed] update_order_status: Insufficient inventory to ship order {order_id}. Have {current_inventory_val}, need {quantity} (in {type(current_storage).__name__}).")
                return {"status": "error", "message": f"Insufficient inventory to ship. Available: {current_inventory_val}, Required: {quantity}"}
            product["inventory_count"] -= quantity
            inventory_adjusted = True
            current_inventory_val = product["inventory_count"]
            print(f"[Tool Executed] update_order_status: Order {order_id} Shipped. Inv for {product_id} reduced by {quantity} to {current_inventory_val} (in {type(current_storage).__name__}).")
        elif new_status in ["Returned", "Canceled"] and old_status in ["Shipped", "Delivered"]:
            product["inventory_count"] += quantity
            inventory_adjusted = True
            current_inventory_val = product["inventory_count"]
            print(f"[Tool Executed] update_order_status: Order {order_id} {new_status}. Inv for {product_id} increased by {quantity} to {current_inventory_val} (in {type(current_storage).__name__}).")
    else:
        print(f"[Tool Executed] update_order_status: Product {product_id} for order {order_id} not found for inventory adjustment (in {type(current_storage).__name__}).")

    order["status"] = new_status
    print(f"[Tool Executed] update_order_status: Order {order_id} status updated from {old_status} to {new_status} (in {type(current_storage).__name__}).")
    return {
        "status": "success",
        "message": f"Order {order_id} status updated from {old_status} to {new_status}.",
        "order_id": order_id,
        "product_id": product_id,
        "old_status": old_status,
        "new_status": new_status,
        "inventory_adjusted": inventory_adjusted,
        "current_inventory": current_inventory_val,
        "order_details": dict(order)
    }

print("Tool functions defined.")

Tool functions defined.


In [13]:
class ConversationContext:
    def __init__(self):
        self.messages: List[Dict[str, Any]] = []
        self.context_data: Dict[str, Any] = {
            "customers": {}, "products": {}, "orders": {}, "last_action": None
        }
        self.session_start_time = datetime.now()

    def add_user_message(self, message: str) -> None:
        self.messages.append({"role": "user", "content": message})

    def add_assistant_message(self, message_content: Union[str, List[Dict[str, Any]]]) -> None:
        self.messages.append({"role": "assistant", "content": message_content})

    def update_entity_in_context(self, entity_type: str, entity_id: str, data: Any) -> None:
        if entity_type in self.context_data:
            self.context_data[entity_type][entity_id] = data # Store the actual data
            print(f"[Context Updated] Entity: {entity_type}, ID: {entity_id}, Data (type): {type(data)}")

    def set_last_action(self, action_type: str, action_details: Any) -> None:
        self.context_data["last_action"] = {
            "type": action_type,
            "details": action_details,
            "timestamp": datetime.now().isoformat()
        }
        print(f"[Context Updated] Last Action: {action_type}, Details: {json.dumps(action_details, default=str)}")


    def get_full_conversation_for_api(self) -> List[Dict[str, Any]]:
        return self.messages.copy()

    def get_context_summary(self) -> str:
        summary_parts = []
        if self.context_data["customers"]:
            customers_str = ", ".join([f"ID: {cid} (Name: {c.get('name', 'N/A') if isinstance(c, dict) else 'N/A'})" for cid, c in self.context_data["customers"].items()])
            summary_parts.append(f"Recent customers: {customers_str}")
        if self.context_data["products"]:
            products_str = ", ".join([f"ID: {pid} (Name: {p.get('name', 'N/A') if isinstance(p, dict) else 'N/A'})" for pid, p in self.context_data["products"].items()])
            summary_parts.append(f"Recent products: {products_str}")
        if self.context_data["orders"]:
            orders_str = ", ".join([f"ID: {oid} (Product: {o.get('product_name', 'N/A') if isinstance(o, dict) else 'N/A'}, Status: {o.get('status', 'N/A') if isinstance(o, dict) else 'N/A'})" for oid, o in self.context_data["orders"].items()])
            summary_parts.append(f"Recent orders: {orders_str}")

        last_action = self.context_data["last_action"]
        if last_action:
            action_type = last_action['type']
            action_details_summary = "..." # Default summary
            if isinstance(last_action.get('details'), dict):
                action_input = last_action['details'].get('input', {})
                action_result_status = last_action['details'].get('result', {}).get('status')
                action_details_summary = f"Input: {action_input}, Result Status: {action_result_status}"
                if action_result_status == "success":
                    if "order_id" in last_action['details'].get('result', {}):
                         action_details_summary += f", OrderID: {last_action['details']['result']['order_id']}"
                    elif "product_id" in last_action['details'].get('result', {}):
                         action_details_summary += f", ProductID: {last_action['details']['result']['product_id']}"


            summary_parts.append(f"Last action: {action_type} at {last_action['timestamp']} ({action_details_summary})")

        if not summary_parts: return "No specific context items set yet."
        return "\n".join(summary_parts)

    def clear(self) -> None:
        self.messages = []
        self.context_data = {"customers": {}, "products": {}, "orders": {}, "last_action": None}
        self.session_start_time = datetime.now()
        print("[Context Cleared]")

print("ConversationContext class defined.")


ConversationContext class defined.


In [14]:
#DualAgentEvaluator Class Definition (with Drive Mount RAG)


class DualAgentEvaluator:
    def __init__(self):
        self.conversation_context = ConversationContext()
        self.evaluation_results = []
        self.anthropic_storage = Storage() # For agent's e-commerce data
        self.openai_storage = Storage()   # For agent's e-commerce data
        print("DualAgentEvaluator initialized with separate Storage for Anthropic and OpenAI.")

        self.anthropic_tools_schemas = tools_schemas_list
        self.openai_tools_formatted = [{"type": "function", "function": tool_def} for tool_def in tools_schemas_list]
        self.available_tool_functions = {
            "create_customer": create_customer, "get_customer_info": get_customer_info,
            "create_product": create_product, "update_product": update_product,
            "get_product_info": get_product_info, "list_all_products": list_all_products,
            "create_order": create_order, "get_order_details": get_order_details,
            "update_order_status": update_order_status,
        }

        self._mount_drive_if_needed()
        self._initialize_learnings_path()

        print(f"DualAgentEvaluator initialized. OpenAI tools formatted. Learnings path: {LEARNINGS_DRIVE_BASE_PATH}")

    def _mount_drive_if_needed(self):
        """Mounts Google Drive if not already mounted."""
        if not os.path.exists(DRIVE_MOUNT_PATH) or not os.listdir(DRIVE_MOUNT_PATH): # Heuristic to check if mounted
            try:
                drive.mount(DRIVE_MOUNT_PATH)
                print(f"Google Drive mounted successfully at {DRIVE_MOUNT_PATH}.")
            except Exception as e:
                print(f"Error mounting Google Drive: {e}. RAG features will not work.")
                # Potentially raise an error or set a flag to disable RAG
        else:
            print(f"Google Drive already mounted at {DRIVE_MOUNT_PATH}.")


    def _initialize_learnings_path(self):
        """Initializes and ensures the learnings directory exists."""
        global LEARNINGS_DRIVE_BASE_PATH
        if not LEARNINGS_DRIVE_BASE_PATH: # If not set by user directly
            learnings_path_input = input(f"Enter the path within your Google Drive for storing learnings (e.g., 'My Drive/AI_Learnings') or press Enter to use default '{DEFAULT_LEARNINGS_DRIVE_SUBPATH}': ").strip()
            if not learnings_path_input:
                LEARNINGS_DRIVE_BASE_PATH = os.path.join(DRIVE_MOUNT_PATH, DEFAULT_LEARNINGS_DRIVE_SUBPATH)
            else:
                # Check if the input path starts with 'My Drive' or similar, if not, prepend it.
                # This is a common user error.
                if not learnings_path_input.lower().startswith('my drive') and not learnings_path_input.startswith('/'):
                    LEARNINGS_DRIVE_BASE_PATH = os.path.join(DRIVE_MOUNT_PATH, "My Drive", learnings_path_input)
                elif not learnings_path_input.startswith(DRIVE_MOUNT_PATH):
                     LEARNINGS_DRIVE_BASE_PATH = os.path.join(DRIVE_MOUNT_PATH, learnings_path_input)
                else: # Assumes full path from /content/drive/ was given
                    LEARNINGS_DRIVE_BASE_PATH = learnings_path_input

        if not os.path.exists(LEARNINGS_DRIVE_BASE_PATH):
            try:
                os.makedirs(LEARNINGS_DRIVE_BASE_PATH)
                print(f"Created learnings directory: {LEARNINGS_DRIVE_BASE_PATH}")
            except Exception as e:
                print(f"Error creating learnings directory {LEARNINGS_DRIVE_BASE_PATH}: {e}")
                print("Please ensure the base path (e.g., '/content/drive/My Drive/') is accessible.")
                # Fallback or error
        else:
            print(f"Learnings directory found: {LEARNINGS_DRIVE_BASE_PATH}")


    def _get_latest_learnings_filepath(self) -> Optional[str]:
        """Finds the most recent learnings JSON file in the directory."""
        if not LEARNINGS_DRIVE_BASE_PATH or not os.path.isdir(LEARNINGS_DRIVE_BASE_PATH):
            print(f"Learnings directory not found or not set: {LEARNINGS_DRIVE_BASE_PATH}")
            return None

        list_of_files = glob.glob(os.path.join(LEARNINGS_DRIVE_BASE_PATH, 'learnings_*.json'))
        if not list_of_files:
            return None
        latest_file = max(list_of_files, key=os.path.getctime)
        return latest_file

    def _read_learnings_from_file(self, filepath: str) -> List[Dict]:
        """Reads and parses a list of learning JSON objects from a given file."""
        if not filepath or not os.path.exists(filepath):
            return []
        try:
            with open(filepath, 'r') as f:
                learnings_list = json.load(f) # Assumes the file contains a single JSON list
            if not isinstance(learnings_list, list):
                print(f"Warning: Learnings file {filepath} does not contain a valid JSON list.")
                return []
            return learnings_list
        except json.JSONDecodeError:
            print(f"Error decoding JSON from learnings file: {filepath}")
            return []
        except Exception as e:
            print(f"Error reading learnings file {filepath}: {e}")
            return []

    def _write_learnings_to_new_timestamped_file(self, learnings_list: List[Dict]):
        """Saves a list of learning objects to a new timestamped JSON file."""
        if not LEARNINGS_DRIVE_BASE_PATH or not os.path.isdir(LEARNINGS_DRIVE_BASE_PATH):
            print(f"Cannot write learnings: directory not found or not set: {LEARNINGS_DRIVE_BASE_PATH}")
            self._initialize_learnings_path() # Try to create it if it was missing
            if not os.path.isdir(LEARNINGS_DRIVE_BASE_PATH):
                 print("Failed to create learnings directory. Aborting write.")
                 return

        timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
        new_filepath = os.path.join(LEARNINGS_DRIVE_BASE_PATH, f'learnings_{timestamp_str}.json')
        try:
            with open(new_filepath, 'w') as f:
                json.dump(learnings_list, f, indent=4) # Save as a JSON list, indented for readability
            print(f"[RAG] Successfully wrote {len(learnings_list)} learnings to new file: {new_filepath}")
        except Exception as e:
            print(f"Error writing new learnings file {new_filepath}: {e}")

    def retrieve_active_learnings(self) -> List[Dict]:
        """Retrieves the current set of active learnings from the latest file."""
        print("[RAG] Retrieving active learnings from Drive...")
        latest_filepath = self._get_latest_learnings_filepath()
        if latest_filepath:
            print(f"[RAG] Loading learnings from: {latest_filepath}")
            return self._read_learnings_from_file(latest_filepath)
        print("[RAG] No existing learnings file found.")
        return []

    def check_relevant_learnings(self, query: str, count: int = 5) -> Optional[str]:
        """Retrieves relevant active learnings and formats them for prompts."""
        active_learnings = self.retrieve_active_learnings()
        if not active_learnings:
            return None

        keywords_from_query = self.extract_keywords(query)
        relevant_learning_objects = []

        for learning_entry in active_learnings:
            # Ensure 'final_learning_statement' exists, or search in 'original_human_input'
            text_to_search = learning_entry.get("final_learning_statement", "") + " " + \
                             learning_entry.get("original_human_input", "") + " " + \
                             " ".join(learning_entry.get("keywords", []))
            if any(kw.lower() in text_to_search.lower() for kw in keywords_from_query):
                relevant_learning_objects.append(learning_entry)

        # Sort by timestamp_created (descending) to get newest first, then take 'count'
        relevant_learning_objects.sort(key=lambda x: x.get('timestamp_created', ''), reverse=True)

        formatted_learnings = []
        for entry in relevant_learning_objects[:count]:
            statement = entry.get('final_learning_statement', entry.get('original_human_input', str(entry)))
            timestamp = entry.get('timestamp_created', 'N/A')
            formatted_learnings.append(f"- Learning (from {timestamp}): {statement}")

        return "\nRelevant Learnings from Knowledge Base:\n" + "\n".join(formatted_learnings) if formatted_learnings else None

    def process_and_store_new_learning(self, human_feedback_text: str, user_query_context: str, turn_context_summary: str):
        """Orchestrates checking new human feedback for conflicts and storing it by writing a new state file."""
        print(f"\n--- Processing New Candidate Learning from Human Feedback ---")
        print(f"Candidate Human Feedback: \"{human_feedback_text}\"")
        print(f"In context of User Query: \"{user_query_context}\"")

        active_learnings_list = self.retrieve_active_learnings() # Current state

        # Prompt Evaluator to check for conflicts and synthesize
        evaluator_task_prompt_parts = [
            "You are an AI assistant helping to maintain a knowledge base of 'learnings' for customer service AI agents.",
            "A human user has provided new feedback, which is a candidate for a new learning.",
            f"The New Human Feedback is: \"{human_feedback_text}\"",
            f"This feedback was given in the context of the user query: \"{user_query_context}\"",
            # f"The general conversation context at that time was: \"{turn_context_summary}\"", # Can be too verbose
            "\nHere are some existing ACTIVE learnings from our knowledge base that might be relevant (if any):"
        ]
        if active_learnings_list:
            # For conflict checking, provide statements of existing learnings
            for i, el_entry in enumerate(active_learnings_list[-10:]): # Check against last 10 for brevity
                stmt = el_entry.get('final_learning_statement', el_entry.get('original_human_input', 'N/A'))
                orig_human_input_for_existing = el_entry.get('original_human_input', 'N/A')
                evaluator_task_prompt_parts.append(f"  Existing Learning {i+1} (Original human input: '{orig_human_input_for_existing}'): \"{stmt}\"")
        else:
            evaluator_task_prompt_parts.append("  (No existing active learnings found.)")

        evaluator_task_prompt_parts.extend([
            "\nYour Tasks:",
            "1. Analyze the New Human Feedback.",
            "2. Compare it with the Existing ACTIVE Learnings provided. Identify any direct CONFLICTS or if the new feedback is essentially a DUPLICATE/REDUNDANT.",
            "3. If a clear CONFLICT is found with an existing learning:",
            "   - Output the specific tag: `CONFLICT_DETECTED:`",
            "   - Clearly explain the conflict, citing the new feedback and the specific existing learning statement(s) it conflicts with (and their original human input if provided above).",
            "   - Ask the human user specific questions to help them resolve this conflict (e.g., 'Which learning should take precedence?' or 'How can these be reconciled?').",
            "   - DO NOT provide a 'FINALIZED_LEARNING' in this case.",
            "4. If the New Human Feedback is a DUPLICATE/REDUNDANT of an existing learning and adds no new value:",
            "   - Output the specific tag: `REDUNDANT_LEARNING:`",
            "   - State which existing learning it is similar to.",
            "   - DO NOT provide a 'FINALIZED_LEARNING'.",
            "5. If the New Human Feedback provides a new, actionable insight, or significantly refines/clarifies an existing one (and is not a major conflict that needs resolution first):",
            "   - Synthesize a concise, clear, and actionable 'Finalized Learning Statement'. This statement should be phrased as a directive or principle for the worker AIs.",
            "   - Prefix this statement with the specific tag: `FINALIZED_LEARNING:`",
            "6. If the New Human Feedback is too vague, not actionable, or not suitable as a learning:",
            "   - Output the specific tag: `NOT_ACTIONABLE:`",
            "   - Explain why.",
            "   - DO NOT provide a 'FINALIZED_LEARNING'."
        ])
        evaluator_conflict_check_prompt = "\n".join(evaluator_task_prompt_parts)

        print("\n[RAG] Sending context to Evaluator for learning synthesis/conflict check...")
        synthesis_response_obj = eval_model_instance.generate_content(evaluator_conflict_check_prompt)
        evaluator_synthesis_text = synthesis_response_obj.text
        print(f"\n[RAG] Evaluator response on learning processing:\n{evaluator_synthesis_text}")

        # Prepare the new list of learnings for the next state
        next_active_learnings_list = list(active_learnings_list) # Start with a copy of current learnings
        made_change_to_learnings = False

        if "CONFLICT_DETECTED:" in evaluator_synthesis_text:
            conflict_explanation = evaluator_synthesis_text.split("CONFLICT_DETECTED:", 1)[-1].strip()
            print(f"\n🛑 CONFLICT DETECTED BY EVALUATOR 🛑\n{conflict_explanation}")

            resolution_instruction = ("Please review the conflict. How do you want to resolve this?\n"
                                  "  - Type 'use new' to add the new feedback as a learning (potentially superseding others implicitly).\n"
                                  "  - Type 'discard new' to not add this new feedback.\n"
                                  "  - Type 'merge: [your new merged learning statement]' to provide a reconciled version.\n"
                                  "  - Type 'keep existing' if the current learnings are preferred and the new one should be ignored.\n"
                                  "Your input: ")
            human_resolution = input(resolution_instruction).strip()

            if human_resolution.lower() == "use new":
                final_learning_statement_to_store = human_feedback_text # Use raw human feedback as the learning
                print("Resolution: Adding new human feedback as a learning.")
                made_change_to_learnings = True
            elif human_resolution.lower().startswith("merge:"):
                final_learning_statement_to_store = human_resolution.split("merge:", 1)[-1].strip()
                print(f"Resolution: Using merged learning statement: {final_learning_statement_to_store}")
                made_change_to_learnings = True
            elif human_resolution.lower() == "discard new" or human_resolution.lower() == "keep existing":
                print("Resolution: New candidate learning discarded or existing kept. No changes to active learnings based on this feedback.")
                final_learning_statement_to_store = None # No new learning to add
            else:
                print("Resolution input unclear. New candidate learning discarded for safety.")
                final_learning_statement_to_store = None

            if final_learning_statement_to_store: # If a new/merged learning is to be added
                 new_learning_entry = {
                    "learning_id": str(uuid.uuid4()),
                    "timestamp_created": datetime.now().isoformat(),
                    "user_query_context_at_feedback": user_query_context,
                    "original_human_input": human_feedback_text, # The feedback that started this
                    "final_learning_statement": final_learning_statement_to_store,
                    "keywords": self.extract_keywords(final_learning_statement_to_store + " " + human_feedback_text),
                    "status": "active"
                }
                 next_active_learnings_list.append(new_learning_entry)


        elif "FINALIZED_LEARNING:" in evaluator_synthesis_text:
            final_learning_statement_from_evaluator = evaluator_synthesis_text.split("FINALIZED_LEARNING:", 1)[-1].strip()
            print(f"\n✅ Finalized Learning by Evaluator: \"{final_learning_statement_from_evaluator}\"")
            new_learning_entry = {
                "learning_id": str(uuid.uuid4()),
                "timestamp_created": datetime.now().isoformat(),
                "user_query_context_at_feedback": user_query_context,
                "original_human_input": human_feedback_text,
                "final_learning_statement": final_learning_statement_from_evaluator,
                "keywords": self.extract_keywords(final_learning_statement_from_evaluator + " " + human_feedback_text),
                "status": "active"
            }
            next_active_learnings_list.append(new_learning_entry)
            made_change_to_learnings = True
        elif "REDUNDANT_LEARNING:" in evaluator_synthesis_text or "NOT_ACTIONABLE:" in evaluator_synthesis_text:
            print(f"\nℹ️ Evaluator: {evaluator_synthesis_text.strip()}")
            # No change to learnings list
        else:
            print("\n⚠️ Evaluator response format for learning processing was unexpected. Storing raw human feedback as a precaution.")
            new_learning_entry = {
                "learning_id": str(uuid.uuid4()),
                "timestamp_created": datetime.now().isoformat(),
                "user_query_context_at_feedback": user_query_context,
                "original_human_input": human_feedback_text,
                "final_learning_statement": human_feedback_text, # Fallback to raw input
                "keywords": self.extract_keywords(human_feedback_text),
                "status": "active"
            }
            next_active_learnings_list.append(new_learning_entry)
            made_change_to_learnings = True

        if made_change_to_learnings:
            self._write_learnings_to_new_timestamped_file(next_active_learnings_list)
        else:
            print("[RAG] No changes to active learnings. No new learnings file written.")

    # --- Methods like _update_context_from_tool_results, process_tool_call, ---
    # --- get_anthropic_response, get_openai_response, evaluate_responses, ---
    # --- extract_clarification_needed, extract_keywords, extract_score, ---
    # --- process_human_feedback_actions remain largely the same as your ---
    # --- last corrected versions, as their core logic is independent of ---
    # --- how learnings are stored/retrieved for the prompts. ---
    # --- The key change is that check_relevant_learnings now uses the new RAG. ---

    # (Include the full definitions of the methods listed above from your previous working version)
    # For brevity, I'm omitting them here, but they are needed.
    # Ensure `evaluate_responses` does NOT try to store learnings to the old
    # `self.human_feedback_learnings` dictionary. All learning storage should go
    # through `process_and_store_new_learning`.

    def _update_context_from_tool_results(self, tool_name: str, tool_input: Dict, tool_result: Dict, agent_name: str):
        if not isinstance(tool_result, dict):
            print(f"[Context Update Error] Tool result for {tool_name} ({agent_name}) is not a dict: {tool_result}")
            self.conversation_context.set_last_action(f"{tool_name}_{agent_name}", {"input": tool_input, "result": {"status": "error", "message": "Tool result was not a dictionary."}})
            return
        if tool_result.get("status") == "success":
            if "customer_id" in tool_result and "customer" in tool_result and isinstance(tool_result["customer"], dict):
                self.conversation_context.update_entity_in_context("customers", tool_result["customer_id"], tool_result["customer"])
            elif "product_id" in tool_result and "product" in tool_result and isinstance(tool_result["product"], dict):
                self.conversation_context.update_entity_in_context("products", tool_result["product_id"], tool_result["product"])
            elif "order_id" in tool_result and "order_details" in tool_result and isinstance(tool_result["order_details"], dict):
                self.conversation_context.update_entity_in_context("orders", tool_result["order_id"], tool_result["order_details"])
            elif tool_name == "list_all_products" and "products" in tool_result and isinstance(tool_result["products"], dict):
                 for pid, pdata in tool_result["products"].items():
                     self.conversation_context.update_entity_in_context("products", pid, pdata)
        self.conversation_context.set_last_action(f"{tool_name}_{agent_name}", {"input": tool_input, "result": tool_result})

    def process_tool_call(self, tool_name: str, tool_input: Dict[str, Any], target_storage_instance: Storage) -> Dict[str, Any]:
        print(f"--- [Tool Dispatcher] Attempting tool: {tool_name} with input: {json.dumps(tool_input, default=str)} for storage: {type(target_storage_instance).__name__} ---")
        if tool_name in self.available_tool_functions:
            function_to_call = self.available_tool_functions[tool_name]
            try:
                result = function_to_call(target_storage_instance, **tool_input)
                print(f"--- [Tool Dispatcher] Result for {tool_name} on {type(target_storage_instance).__name__}: {json.dumps(result, indent=2, default=str)} ---")
                return result
            except TypeError as te:
                print(f"--- [Tool Dispatcher] TypeError for {tool_name} on {type(target_storage_instance).__name__}: {te}. Input: {tool_input} ---")
                return {"status": "error", "message": f"TypeError calling {tool_name}: {str(te)}. Check arguments."}
            except Exception as e:
                print(f"--- [Tool Dispatcher] Exception for {tool_name} on {type(target_storage_instance).__name__}: {e} ---")
                return {"status": "error", "message": f"Error executing {tool_name}: {str(e)}"}
        else:
            print(f"--- [Tool Dispatcher] Tool {tool_name} not found. ---")
            return {"status": "error", "message": f"Tool {tool_name} not found."}

    def get_anthropic_response(self, current_worker_system_prompt: str, conversation_history: List[Dict[str, Any]]) -> str:
        messages_for_api = conversation_history.copy()
        try:
            for i in range(5):
                system_prompt_snippet = current_worker_system_prompt[:60].replace('\n', ' ')
                print(f"\nAnthropic API Call #{i+1}. System: '{system_prompt_snippet}...', Messages count: {len(messages_for_api)}")
                if messages_for_api: print(f"Last message role: {messages_for_api[-1]['role']}")
                response = anthropic_client.messages.create(
                    model=ANTHROPIC_MODEL_NAME, max_tokens=4000,
                    system=current_worker_system_prompt,
                    tools=self.anthropic_tools_schemas,
                    messages=messages_for_api
                )
                assistant_response_blocks = response.content
                messages_for_api.append({"role": "assistant", "content": assistant_response_blocks})
                tool_calls_to_process = [block for block in assistant_response_blocks if block.type == "tool_use"]
                text_blocks = [block.text for block in assistant_response_blocks if block.type == "text"]
                if not tool_calls_to_process:
                    final_text = " ".join(text_blocks).strip()
                    print(f"Anthropic Final Text (no tool use this turn): {final_text}")
                    return final_text if final_text else "No text content in final Anthropic response."
                tool_results_for_next_call = []
                for tool_use_block in tool_calls_to_process:
                    tool_name, tool_input, tool_use_id = tool_use_block.name, tool_use_block.input, tool_use_block.id
                    print(f"Anthropic Tool Call: {tool_name}, Input: {tool_input}")
                    tool_result_data = self.process_tool_call(tool_name, tool_input, self.anthropic_storage)
                    self._update_context_from_tool_results(tool_name, tool_input, tool_result_data, "Anthropic")
                    tool_results_for_next_call.append({
                        "type": "tool_result", "tool_use_id": tool_use_id,
                        "content": json.dumps(tool_result_data)
                    })
                messages_for_api.append({"role": "user", "content": tool_results_for_next_call})
            return "Max tool iterations reached for Anthropic."
        except Exception as e:
            print(f"Error in get_anthropic_response: {str(e)}")
            import traceback; traceback.print_exc()
            return f"Error getting Anthropic response: {str(e)}"

    def get_openai_response(self, current_worker_system_prompt: str, conversation_history: List[Dict[str, Any]]) -> str:
        messages_for_api = [{"role": "system", "content": current_worker_system_prompt}]
        for msg in conversation_history:
            if msg["role"] == "user": messages_for_api.append(msg)
            elif msg["role"] == "assistant":
                if isinstance(msg["content"], str): messages_for_api.append(msg)
                elif isinstance(msg["content"], dict) and "tool_calls" in msg["content"]: messages_for_api.append(msg)
        try:
            for i in range(5):
                print(f"\nOpenAI API Call #{i+1}. Messages count: {len(messages_for_api)}")
                if messages_for_api and isinstance(messages_for_api[-1], dict):
                    print(f"Last message role: {messages_for_api[-1].get('role', 'N/A')}")
                response = openai_client.chat.completions.create(
                    model=OPENAI_MODEL_NAME, messages=messages_for_api,
                    tools=self.openai_tools_formatted, tool_choice="auto"
                )
                response_message = response.choices[0].message
                messages_for_api.append(response_message.model_dump())
                if not response_message.tool_calls:
                    final_text = response_message.content if response_message.content else "No text content in final OpenAI response."
                    print(f"OpenAI Final Text (no tool use this turn): {final_text}")
                    return final_text
                tool_results_for_next_api_call = []
                for tool_call in response_message.tool_calls:
                    tool_name = tool_call.function.name
                    tool_input_str = tool_call.function.arguments
                    tool_call_id = tool_call.id
                    try: tool_input = json.loads(tool_input_str)
                    except json.JSONDecodeError:
                        print(f"OpenAI Tool Call JSON Error for {tool_name}: {tool_input_str}")
                        tool_result_data = {"status": "error", "message": "Invalid JSON arguments from model."}
                    else:
                        print(f"OpenAI Tool Call: {tool_name}, Input: {tool_input}")
                        tool_result_data = self.process_tool_call(tool_name, tool_input, self.openai_storage)
                    self._update_context_from_tool_results(tool_name, tool_input, tool_result_data, "OpenAI")
                    tool_results_for_next_api_call.append({
                        "tool_call_id": tool_call_id, "role": "tool", "name": tool_name,
                        "content": json.dumps(tool_result_data)
                    })
                messages_for_api.extend(tool_results_for_next_api_call)
            return "Max tool iterations reached for OpenAI."
        except Exception as e:
            print(f"Error in get_openai_response: {str(e)}")
            import traceback; traceback.print_exc()
            return f"Error getting OpenAI response: {str(e)}"

    def process_user_request(self, user_message: str) -> Dict[str, Any]:
        print(f"\n\n{'='*60}\nUser Message: {user_message}\n{'='*60}")
        self.conversation_context.add_user_message(user_message)
        context_summary = self.conversation_context.get_context_summary()
        print(f"Current Context Summary for Models:\n{context_summary}\n{'-'*60}")

        learnings_for_prompt = self.check_relevant_learnings(user_message) # Uses new RAG
        if learnings_for_prompt:
            print(f"Relevant Learnings for this turn (from Drive RAG):\n{learnings_for_prompt}")
        else:
            print("No specific relevant learnings found from Drive RAG for this turn.")

        current_worker_prompt_with_context_and_learnings = f"{worker_system_prompt}\n\nConversation Context:\n{context_summary}\n\n{learnings_for_prompt if learnings_for_prompt else 'No specific past learnings provided for this query.'}"

        # Get responses from worker AIs
        anthropic_response_text = self.get_anthropic_response(current_worker_prompt_with_context_and_learnings, self.conversation_context.get_full_conversation_for_api())
        self.conversation_context.add_assistant_message(f"[Anthropic Final Text]: {anthropic_response_text}")
        openai_response_text = self.get_openai_response(current_worker_prompt_with_context_and_learnings, self.conversation_context.get_full_conversation_for_api())
        self.conversation_context.add_assistant_message(f"[OpenAI Final Text]: {openai_response_text}")
        print(f"\n--- Anthropic Final Response Text ---\n{anthropic_response_text}")
        print(f"--- OpenAI Final Response Text ---\n{openai_response_text}")

        # Evaluate responses
        evaluation = self.evaluate_responses(user_message, anthropic_response_text, openai_response_text, context_summary, learnings_for_prompt or "")
        self.evaluation_results.append(evaluation)

        # Process human feedback for learnings
        human_feedback_candidate = ""
        clarif_details = evaluation.get("clarification_details", {})
        if clarif_details.get("used") and clarif_details.get("provided_input") not in ["Skipped by user", "Skipped (non-interactive)"]:
            # If human provided input during evaluator clarification, consider it a candidate learning
            human_feedback_candidate = clarif_details.get("provided_input")
            print(f"[INFO] Candidate learning from evaluator clarification: '{human_feedback_candidate}'")
            # Process this immediately
            self.process_and_store_new_learning(
                human_feedback_text=human_feedback_candidate,
                user_query_context=user_message, # or a more specific context if available
                turn_context_summary=context_summary
            )
            human_feedback_candidate = "" # Reset after processing

        # Explicit general learnings prompt
        try:
            human_general_learning = input("Do you want to add any general learnings from this turn? (Type your learning or 'skip'): ")
            if human_general_learning.lower() != 'skip' and human_general_learning.strip():
                human_feedback_candidate = human_general_learning
        except EOFError:
            print("EOFError: Skipping general learning input (non-interactive).")

        if human_feedback_candidate: # If general learning was provided
            self.process_and_store_new_learning(
                human_feedback_text=human_feedback_candidate,
                user_query_context=user_message,
                turn_context_summary=context_summary
            )
        return {"user_message": user_message, "anthropic_response": anthropic_response_text,
                "openai_response": openai_response_text, "evaluation": evaluation}

    def evaluate_responses(self, user_message: str, anthropic_response: str, openai_response: str, context_summary_for_eval: str, learnings_for_eval: str) -> Dict[str, Any]:
        print("\n--- Starting Evaluation by Gemini ---")
        try:
            eval_prompt_parts = [
                f"User query: {user_message}",
                f"Current context provided to assistants:\n{context_summary_for_eval}",
                f"Anthropic Claude response:\n{anthropic_response}",
                f"OpenAI GPT response:\n{openai_response}",
                learnings_for_eval,
                "Please evaluate both responses based on accuracy, efficiency, context awareness, and helpfulness. Provide an overall score (1-10) for each and detailed reasoning."
            ]
            eval_prompt = "\n\n".join(filter(None, eval_prompt_parts))
            gemini_response_obj = eval_model_instance.generate_content(eval_prompt)
            evaluation_text = gemini_response_obj.text
            print(f"Gemini Raw Initial Evaluation:\n{evaluation_text}")

            clarification_details = {"used": False, "needed": "", "provided_input": "", "action_summary": ""}
            if "CLARIFICATION NEEDED:" in evaluation_text.upper():
                clarification_details["used"] = True
                clarification_details["needed"] = self.extract_clarification_needed(evaluation_text)
                print(f"--- Human Clarification Indicated by Evaluator ---\nClarification needed: {clarification_details['needed']}")
                try:
                    human_input_for_eval = input(f"Enter human clarification (or 'skip'/'quit'): ")
                    if human_input_for_eval.lower() in ['quit', 'exit', 'stop', 'q']:
                        raise SystemExit("User requested exit during evaluation")
                    if human_input_for_eval.lower() != 'skip' and human_input_for_eval.strip():
                        clarification_details["provided_input"] = human_input_for_eval
                        target_storage_for_action = None
                        if any(cmd in human_input_for_eval.lower() for cmd in ["update order", "create product"]):
                            store_choice = input("Apply data change to (A)nthropic's store, (O)penAI's store, or (S)kip data change? [A/O/S]: ").lower()
                            if store_choice == 'a': target_storage_for_action = self.anthropic_storage
                            elif store_choice == 'o': target_storage_for_action = self.openai_storage
                        action_summary = self.process_human_feedback_actions(human_input_for_eval, target_storage_for_action)
                        clarification_details["action_summary"] = action_summary

                        # Re-evaluate with clarification
                        updated_eval_prompt = f"{eval_prompt}\n\nHuman clarification provided: {human_input_for_eval}\nAction taken based on feedback: {action_summary}\nPlease re-evaluate incorporating this."
                        updated_gemini_response = eval_model_instance.generate_content(updated_eval_prompt)
                        evaluation_text = updated_gemini_response.text
                        print(f"Gemini Raw Re-Evaluation after human input:\n{evaluation_text}")
                    else:
                        clarification_details["provided_input"] = "Skipped by user"
                except EOFError: clarification_details["provided_input"] = "Skipped (non-interactive)"

            anthropic_data_state = { "customers": self.anthropic_storage.customers, "products": self.anthropic_storage.products, "orders": self.anthropic_storage.orders }
            openai_data_state = { "customers": self.openai_storage.customers, "products": self.openai_storage.products, "orders": self.openai_storage.orders }
            comparison_prompt_parts = [
                evaluation_text, "\n\n--- Data Store Comparison Task ---",
                "As a final step, please compare the following data store states from Anthropic and OpenAI.",
                f"Anthropic's Data Store State:\n{json.dumps(anthropic_data_state, indent=2, default=str)}",
                f"OpenAI's Data Store State:\n{json.dumps(openai_data_state, indent=2, default=str)}",
                "1. Identify any key differences between the two data stores.",
                "2. Explain plausible reasons for these differences based on the agents' actions during this turn (if known).",
                "3. State whether these differences, now explicitly reviewed, cause you to update your previous scores or assessment for either agent. If so, provide the updated scores and rationale."
            ]
            comparison_prompt = "\n\n".join(comparison_prompt_parts)
            comparison_response_obj = eval_model_instance.generate_content(comparison_prompt)
            final_evaluation_text = comparison_response_obj.text
            print(f"Gemini Full Evaluation (including Data Store Comparison):\n{final_evaluation_text}")

            anthropic_score = self.extract_score(final_evaluation_text, "Anthropic")
            openai_score = self.extract_score(final_evaluation_text, "OpenAI")
            return {"anthropic_score": anthropic_score, "openai_score": openai_score,
                    "full_evaluation": final_evaluation_text, "clarification_details": clarification_details}
        except Exception as e:
            print(f"Error in evaluation: {str(e)}")
            import traceback; traceback.print_exc()
            return {"error": f"Error in evaluation: {str(e)}", "anthropic_score": 0, "openai_score": 0,
                    "full_evaluation": f"Evaluation failed: {str(e)}", "clarification_details": {"used": False, "action_summary": ""}}

    def extract_clarification_needed(self, evaluation_text: str) -> str:
        clarification_match = re.search(r"CLARIFICATION NEEDED:\\s*(.*?)(?:\\n|$)", evaluation_text, re.IGNORECASE | re.DOTALL)
        if clarification_match and clarification_match.group(1).strip():
            return clarification_match.group(1).strip()
        lines = evaluation_text.splitlines()
        for i, line in enumerate(lines):
            if "CLARIFICATION NEEDED:" in line.upper():
                return line.split("CLARIFICATION NEEDED:", 1)[-1].strip() + "\n" + "\n".join(lines[i+1:i+3])
        return "Evaluator indicated clarification needed, but specific question not formatted as expected. Review raw evaluation."

    def extract_keywords(self, text: str) -> List[str]:
        if not text: return ["general"]
        words = re.findall(r'\b\w{4,}\b', text.lower())
        stop_words = {"the", "and", "is", "in", "to", "a", "of", "for", "with", "on", "at", "what", "how", "show", "tell", "please", "what's", "i'd", "like", "user", "query", "this", "that", "context"}
        extracted = list(set(word for word in words if word not in stop_words))
        return extracted if extracted else ["generic"]

    def extract_score(self, evaluation_text: str, model_name_pattern: str) -> int:
        comparison_section_start = evaluation_text.upper().rfind("--- DATA STORE COMPARISON TASK ---")
        search_text = evaluation_text
        if comparison_section_start != -1:
            update_score_marker = re.search(r"update your previous scores|updated scores and rationale", evaluation_text[comparison_section_start:], re.IGNORECASE)
            if update_score_marker:
                search_text = evaluation_text[comparison_section_start:]
        patterns = [
            rf"{model_name_pattern}.*?Overall Score.*?(\d+)/10", rf"{model_name_pattern}.*?Overall Score:\s*(\d+)",
            rf"Overall Score.*?{model_name_pattern}.*?:\s*(\d+)", rf"{model_name_pattern}.*?score.*?:.*?(\d+)",
            rf"{model_name_pattern}.*?\bscore\b.*?(\d+)", rf"{model_name_pattern}.*?rating.*?:.*?(\d+)",
            rf"{model_name_pattern}.*?\b(\d+)/10", rf"Updated score for {model_name_pattern}.*?:.*?(\d+)",
            rf"{model_name_pattern}.*?updated overall score.*?:.*?(\d+)"
        ]
        for p_str in reversed(patterns):
            matches = list(re.finditer(p_str, search_text, re.IGNORECASE | re.DOTALL))
            if matches:
                last_match = matches[-1]
                if last_match.group(1):
                    try:
                        score = int(last_match.group(1))
                        print(f"Extracted score {score} for '{model_name_pattern}' using pattern: {p_str}")
                        return score
                    except ValueError: continue
        print(f"Could not extract score for '{model_name_pattern}' from eval text snippet (tried specific patterns):\n{search_text[-500:]}...")
        return 0

    def process_human_feedback_actions(self, feedback: str, target_storage_for_action: Optional[Storage]) -> str:
        action_result_summary = "No specific data action taken based on feedback."
        if not target_storage_for_action:
            action_result_summary = "Skipping data action: No target storage specified for feedback."
            print(f"[Human Feedback Action] {action_result_summary}")
            return action_result_summary
        order_update_match = re.search(r"update\s+order\s+(\w+)\s+status\s+to\s+(\w+)", feedback, re.IGNORECASE)
        if order_update_match:
            order_id, new_status = order_update_match.groups()
            try:
                result = self.process_tool_call("update_order_status", {"order_id": order_id, "new_status": new_status}, target_storage_for_action)
                action_result_summary = f"Action executed on {type(target_storage_for_action).__name__}: Updated order {order_id} status to {new_status}. Result: {result.get('status', 'N/A')}"
                if result.get("status") == "success" and "order_details" in result:
                     self.conversation_context.update_entity_in_context("orders", order_id, result["order_details"])
                     self.conversation_context.set_last_action(f"human_feedback_update_order_{type(target_storage_for_action).__name__}", {"input": {"order_id": order_id, "new_status": new_status}, "result": result})
            except Exception as e: action_result_summary = f"Failed to update order {order_id} on {type(target_storage_for_action).__name__}: {str(e)}"
            print(f"[Human Feedback Action] {action_result_summary}")
            return action_result_summary
        product_create_match = re.search(r"create\s+(?:new\s+)?product:\\s*(.*?),\\s*description:\\s*(.*?),\\s*price:\\s*(\\d+\\.?\\d*),\\s*inventory:\\s*(\\d+)", feedback, re.IGNORECASE)
        if product_create_match:
            name, desc, price, inventory = product_create_match.groups()
            try:
                tool_input = {"name": name.strip(), "description": desc.strip(), "price": float(price), "inventory_count": int(inventory)}
                result = self.process_tool_call("create_product", tool_input, target_storage_for_action)
                action_result_summary = f"Action executed on {type(target_storage_for_action).__name__}: Created product '{name}'. Result: {result.get('status', 'N/A')}"
                if result.get("status") == "success" and "product" in result:
                     self.conversation_context.update_entity_in_context("products", result["product_id"], result["product"])
                     self.conversation_context.set_last_action(f"human_feedback_create_product_{type(target_storage_for_action).__name__}", {"input": tool_input, "result": result})
            except Exception as e: action_result_summary = f"Failed to create product '{name}' on {type(target_storage_for_action).__name__}: {str(e)}"
            print(f"[Human Feedback Action] {action_result_summary}")
            return action_result_summary
        print(f"[Human Feedback Action] {action_result_summary}")
        return action_result_summary

print("DualAgentEvaluator class defined with Drive Mount RAG logic.")


DualAgentEvaluator class defined with Drive Mount RAG logic.


In [15]:
# Cell for main() function

def main():
    print("\nStarting Main Execution with Drive Mount RAG...\n")

    # DualAgentEvaluator's __init__ now handles drive mounting and path setup.
    # It will prompt the user for the learnings path if needed.
    try:
        agent = DualAgentEvaluator()
    except Exception as e:
        print(f"Failed to initialize DualAgentEvaluator: {e}")
        print("Please ensure Google Drive can be mounted and the learnings path is valid.")
        return

    results_log = []

    while True:
        try:
            user_query = input("\nEnter your query (or 'quit', 'exit', 'stop', 'q' to end): ")
            if user_query.lower() in ['quit', 'exit', 'stop', 'q']:
                print("Exiting the system. Goodbye!")
                break
            if not user_query.strip():
                print("Empty query, please enter something.")
                continue

            result = agent.process_user_request(user_query)
            results_log.append(result)
        except SystemExit as se:
            print(f"System exit requested: {se}")
            break
        except Exception as e:
            print(f"CRITICAL ERROR processing query '{user_query}': {e}")
            import traceback
            traceback.print_exc()
            results_log.append({
                "user_message": user_query, "anthropic_response": "ERROR", "openai_response": "ERROR",
                "evaluation": {"anthropic_score": 0, "openai_score": 0, "full_evaluation": f"Critical error: {e}",
                               "clarification_details": {"used": False, "action_summary": ""}}
            })

    print("\n\n===== EVALUATION SUMMARY =====")
    total_anthropic, total_openai, num_q = 0, 0, 0
    for i, res in enumerate(results_log):
        if not res:
            print(f"\nQuery {i+1}: Skipped (empty result).")
            continue
        num_q +=1
        print(f"\nQuery {i+1}: {res.get('user_message', 'N/A')}")
        print(f"  Anthropic Resp: {str(res.get('anthropic_response', 'N/A'))[:100]}...")
        print(f"  OpenAI Resp: {str(res.get('openai_response', 'N/A'))[:100]}...")
        eval_data = res.get('evaluation', {})
        anth_s = eval_data.get('anthropic_score',0)
        open_s = eval_data.get('openai_score',0)
        total_anthropic += anth_s
        total_openai += open_s
        print(f"  Scores - Anthropic: {anth_s}, OpenAI: {open_s}")
        clarif_details = eval_data.get('clarification_details',{})
        if clarif_details.get('used'):
            print(f"    Clarification: Needed='{clarif_details.get('needed', 'N/A')}', Provided='{clarif_details.get('provided_input', 'N/A')}'")
            if clarif_details.get('action_summary'):
                 print(f"    Action from Clarification: {clarif_details['action_summary']}")
        winner = "Tie"
        if anth_s is not None and open_s is not None:
            if anth_s > open_s: winner = "Anthropic"
            elif open_s > anth_s: winner = "OpenAI"
        print(f"  Query Winner: {winner}")

    print(f"\n----- Overall Performance -----")
    if num_q > 0:
        print(f"Avg Anthropic: {total_anthropic/num_q:.2f}, Avg OpenAI: {total_openai/num_q:.2f}")
    else:
        print("No queries processed to calculate average scores.")
    print(f"Total Anthropic: {total_anthropic}, Total OpenAI: {total_openai}")
    overall_winner = "Tie"
    if total_anthropic > total_openai: overall_winner = "Anthropic"
    elif total_openai > total_anthropic: overall_winner = "OpenAI"
    print(f"Overall Winner: {overall_winner}")

    print(f"\nLearnings are stored as timestamped JSON files in your Google Drive at: {LEARNINGS_DRIVE_BASE_PATH}")
    print("The latest file in that directory represents the current active set of learnings.")
    latest_learnings_file = agent._get_latest_learnings_filepath() # Accessing protected member for summary
    if latest_learnings_file:
        print(f"Most recent learnings file: {os.path.basename(latest_learnings_file)}")
    else:
        print("No learnings files found in the directory.")

    print("\nExecution Finished.")

# To run:
# main()

In [16]:
""" Sample queries:
* Show me all the products available
* I'd like to order 25 Perplexinators, please
* Show me the status of my order
* (If the order is not in Shipped state, then) Please ship my order now
* How many Perplexinators are now left in stock?
* Add a new customer: Bill Leece, bill.leece@mail.com, +1.222.333.4444
* Add new new product: Gizmo X, description: A fancy gizmo, price: 29.99, inventory: 50
* Update Gizzmo's price to 99.99
* I need to update our insurance policy, so I need to know the total value of everything that I have in our inventory. Please tell me this amount.
* Summarize your learnings from our recent interactions.
"""

main()


Starting Main Execution with Drive Mount RAG...

DualAgentEvaluator initialized with separate Storage for Anthropic and OpenAI.
Mounted at /content/drive
Google Drive mounted successfully at /content/drive.
Enter the path within your Google Drive for storing learnings (e.g., 'My Drive/AI_Learnings') or press Enter to use default 'My Drive/AI/Knowledgebases': 
Learnings directory found: /content/drive/My Drive/AI/Knowledgebases
DualAgentEvaluator initialized. OpenAI tools formatted. Learnings path: /content/drive/My Drive/AI/Knowledgebases

Enter your query (or 'quit', 'exit', 'stop', 'q' to end): Show me all the products available


User Message: Show me all the products available
Current Context Summary for Models:
No specific context items set yet.
------------------------------------------------------------
[RAG] Retrieving active learnings from Drive...
[RAG] No existing learnings file found.
No specific relevant learnings found from Drive RAG for this turn.

Anthropic API Call #1