# One-Line Observability for Computer Use Agents with Maxim

This notebook demonstrates how to add comprehensive observability to a Computer Use Agent (CUA) using Maxim's one-line integration. We'll show how easy it is to get full visibility into your agent's actions, decisions, and performance.

## What You'll Learn
- How to add observability to a LangGraph CUA with just one line of code
- Understanding agent behavior through Maxim's tracing capabilities
- Monitoring agent performance and debugging issues effectively
- Capturing and analyzing agent interactions and decisions

## Why Maxim for Agent Observability?
- **Minimal Setup**: One-line integration with existing agent code
- **Comprehensive Tracing**: Automatically capture all agent actions and decisions
- **Debug with Ease**: Visualize agent workflow and identify issues quickly
- **Performance Insights**: Monitor agent performance and behavior patterns

## Prerequisites
Before running this notebook, make sure you have:
1. Set up your environment variables (see `.env.example`)
   - `MAXIM_API_KEY`: Your Maxim API key
   - `MAXIM_LOG_REPO_ID`: Your Maxim logger ID
   - `OPENAI_API_KEY`: Your OpenAI API key
   - `SCRAPYBARA_API_KEY`: Your Scrapybara API key
2. Installed required dependencies (see `requirements.txt`)
3. Access to OpenAI API (for the agent's underlying model)


## Setting Up Maxim Observability

First, let's set up Maxim for observability. This is the only setup required to get comprehensive tracing and monitoring for your agent.


In [None]:
# Import required libraries
from typing import List, Literal
import os
from dotenv import load_dotenv

# Import LangGraph and LangChain components
from langchain_core.messages import AnyMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import END, START, StateGraph
from pydantic import BaseModel, Field

# Import the Computer Use Agent components
from langgraph_cua import create_cua
from langgraph_cua.types import CUAState

# The key imports for Maxim observability
from maxim import Maxim
from maxim.decorators.langchain import langchain_callback, langgraph_agent
from maxim.decorators import trace, current_trace

# Load environment variables
load_dotenv()

# Initialize Maxim logger - This one line enables comprehensive observability
logger = Maxim(
    {"api_key": os.getenv("MAXIM_API_KEY")}
).logger({"id": os.getenv("MAXIM_LOG_REPO_ID")})



## Using the Computer Use Agent

Now that we have Maxim set up, let's use the pre-built Computer Use Agent from LangGraph. The key here is that we'll wrap it with Maxim's decorators to get automatic tracing and observability.


## What's Being Traced?

When you run this notebook, Maxim automatically captures:

1. **High-Level Operation**
   - The entire agent interaction from start to finish
   - Total execution time and status

2. **LangGraph Operations**
   - Node transitions in the agent workflow
   - Routing decisions and their rationale
   - Computer use actions and results

3. **LangChain Operations**
   - All LLM calls and their parameters
   - Prompt templates and their rendered versions
   - Token usage and costs

4. **Custom Metrics**
   - Input/output pairs at each step
   - Any errors or exceptions
   - Performance metrics

All of this is available in your Maxim dashboard, with just the one-line setup and two decorators we added!


## State Definition

First, we'll define our state class that extends the base CUA state. This class will handle the routing logic for our agent.


In [None]:
class ResearchState(CUAState):
    """State class for the research agent workflow, extending the CUA state.
    This state tracks whether to use the computer for research or respond directly."""
    route: Literal["respond", "computer_use_agent"]


## Input Processing

Next, we'll define the function that processes user input and determines whether to route to the computer use agent or generate a direct response. This function uses GPT-4 to make the routing decision.


In [None]:
def process_input(state: ResearchState):
    """
    Analyzes the user's latest message and determines whether to route to the
    computer use agent or to generate a direct response.
    """
    system_message = {
        "role": "system",
        "content": (
            "You're an advanced AI assistant tasked with routing the user's query to the appropriate node."
            "Your options are: computer use or respond. You should pick computer use if the user's request requires "
            "using a computer (e.g. looking up a price on a website, or do a websearch), and pick respond for ANY other inputs."
        ),
    }

    class RoutingToolSchema(BaseModel):
        """Route the user's request to the appropriate node."""
        route: Literal["respond", "computer_use_agent"] = Field(
            ...,
            description="The node to route to, either 'computer_use_agent' for any input which might require using a computer to assist the user, or 'respond' for any other input",
        )

    model = ChatOpenAI(model="gpt-4o", temperature=0)
    model_with_tools = model.with_structured_output(RoutingToolSchema)

    user_messages = state.get("messages", [])
    if not user_messages:
        return {"route": "respond"}  # Default to respond if no messages

    messages = [system_message, {"role": "user", "content": user_messages[-1].content}]
    response = model_with_tools.invoke(messages)
    return {"route": response.route}


## Response Generation

Now we'll define the function that generates responses when the routing decision is to respond directly (without using the computer).


In [None]:
def respond(state: ResearchState):
    """
    Generates a general response to the user based on the entire conversation history.
    """
    def format_messages(messages: List[AnyMessage]) -> str:
        """Formats a list of messages into a single string with type and content."""
        return "\n".join([f"{message.type}: {message.content}" for message in messages])

    system_message = {
        "role": "system",
        "content": (
            "You're an advanced AI assistant tasked with responding to the user's input."
            "You're provided with the full conversation between the user, and the AI assistant. "
            "This conversation may include messages from a computer use agent, along with "
            "general user inputs and AI responses. \n\n"
            "Given all of this, please RESPOND to the user. If there is nothing to respond to, you may return something like 'Let me know if you have any other questions.'"
        ),
    }
    human_message = {
        "role": "user",
        "content": "Here are all of the messages in the conversation:\n\n"
        + format_messages(state.get("messages")),
    }

    model = ChatOpenAI(model="gpt-4o", temperature=0)
    response = model.invoke([system_message, human_message])
    
    return {"response": response}


## Graph Construction

Now we'll construct the LangGraph workflow that ties everything together. This includes:
1. Creating the state graph
2. Adding nodes for input processing, response generation, and computer use
3. Setting up the edges between nodes
4. Compiling the graph


In [None]:
# Create the CUA graph
cua_graph = create_cua()

def route_after_processing_input(state: ResearchState):
    """Conditional router that returns the route determined by process_input."""
    return state.get("route")

# Create and configure the workflow
workflow = StateGraph(ResearchState)
workflow.add_node("process_input", process_input)
workflow.add_node("respond", respond)
workflow.add_node("computer_use_agent", cua_graph)

# Add edges
workflow.add_edge(START, "process_input")
workflow.add_conditional_edges("process_input", route_after_processing_input)
workflow.add_edge("respond", END)
workflow.add_edge("computer_use_agent", END)

# Compile the graph
graph = workflow.compile()
graph.name = "Research Agent"


## Agent Interface

Finally, we'll create the interface for interacting with our agent. This includes:
1. A decorated function for handling agent requests
2. Proper integration with Maxim for observability
3. Asynchronous execution support


In [None]:
@langgraph_agent(name="research-agent-v1")
async def ask_agent(messages):
    config = {"recursion_limit": 50, "callbacks": [langchain_callback()]}
    stream = graph.astream({"messages": messages}, subgraphs=True, stream_mode="updates", config=config)
    last_update = None
    async for update in stream:
        if "computer_use_agent" in update[1]:
            last_update = update[1]["computer_use_agent"].get("messages", {})
    return last_update

@trace(logger=logger, name="research-agent-v1")
async def handle(messages) -> str:
    response = await ask_agent(messages)
    if isinstance(response, str):
        current_trace().set_output(response)
    return response

## Example Usage

Let's try out our Computer Use Agent with a simple example: finding today's top song on Billboard's charts.


In [None]:
async def main():
    """Run the agent workflow."""
    messages = [
        {
            "role": "system",
            "content": (
                "You're an advanced AI computer use assistant. The browser you are using "
                "is already initialized, and visiting google.com."
            ),
        },
        {
            "role": "user",
            "content": "find today's top 1 song on billboard's charts.",
        },
    ]

    await handle(messages)

# Run the example
await main()


## Observability in Action

Let's look at what Maxim's observability provides when running our Research Agent. Below are some key visualizations from the Maxim dashboard:

### 1. Full Trace Overview
![Trace Overview](./assets/image_1.png)
This shows the complete trace of our agent's execution, including all steps from initial request to final response along with the associated costs and latency.

### 2. Precise Metadata Extraction
![Metadata](./assets/image_2.png)
Here you can see how Maxim captures relevant metadata precisely, including but not limited to token usage and the LLM being called.

### 3. Computer Use Actions
![Computer Use](./assets/image_3.png)
This visualization shows the actual computer use actions being performed by our agent when researching information, which is a click at the given captured instant.

### 4. Performance Metrics
![Performance](./assets/image_4.png)
Screenshots at every step of the agents trajectory get logged!

These visualizations help us:
- Debug agent behavior and decision-making
- Monitor performance and resource usage
- Track successful vs failed interactions
- Identify optimization opportunities

And remember - all of this observability came from just:
1. One line of logger initialization
2. Two decorators (`@langgraph_agent` and `@trace`)
3. A callback configuration

No manual instrumentation or complex setup required!
