In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

In [4]:
import sys
sys.path.append("../")
from prompts.single_agent_prompt import baseline_instructions

In [6]:
from deepagents import create_deep_agent
from langchain_ollama import ChatOllama
from langchain_mcp_adapters.client import MultiServerMCPClient

client = MultiServerMCPClient(
    {
        "mcp_zero_and_alita_rag": {
            "url": "http://localhost:8000/mcp",
            "transport": "streamable_http"
        }
    }
)
tools = await client.get_tools()

model = ChatOllama(
    model="gpt-oss:20b",
    temperature=0,
)

In [6]:
from langchain_core.messages import HumanMessage, SystemMessage

## Baseline

In [None]:
from langgraph.prebuilt import create_react_agent
response = ""

react_agent = create_react_agent(
    model=model,
    tools=tools,
)

messages = [
    SystemMessage(content=baseline_instructions),
    HumanMessage(content="How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?")
]

async for chunk in react_agent.astream(
    {"messages": messages},
    stream_mode="values"
):
    if "messages" in chunk:
        chunk['messages'][-1].pretty_print()

response += chunk['messages'][-1].content
splits = response.split("\n")

with open("../reports/react_report.txt", "w") as f:
    for split in splits:
        f.write(split + "\n")


How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?
Tool Calls:
  alita_documentation (497f9b6c-c2d3-401b-ad37-8fba2898192f)
 Call ID: 497f9b6c-c2d3-401b-ad37-8fba2898192f
  Args:
    query: Alita enable agents to create and deploy MCP tools if the tool does not exist
Name: alita_documentation

<think>
Okay, let's tackle this query. The user is asking whether Alita enables agents to create and deploy MCP tools if the tool doesn't exist, and they want a detailed and exhaustive answer based on the provided context.

First, I need to recall the context information. The paper describes Alita as a generalist agent that uses MCPs (Model Context Protocols) to dynamically generate tools. The key points from the context are:

1. **MCP Creation**: Alita has a component called MCP Brainstorming that assesses the agent's capabilities and identifies gaps. If the existing tools can't handle a task, it suggests generating new tools. The Script

## Running our first deep agent without sub agents

In [7]:
base_deep_agent = create_deep_agent(
    model=model,
    tools=tools,
    instructions=baseline_instructions
)

In [8]:
response = ""

async for chunk in base_deep_agent.astream(
    {
        "messages": [
            {"role": "user", "content": "How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?"}
        ]
    },
    stream_mode="values"
):
    if "messages" in chunk:
        chunk['messages'][-1].pretty_print()
        
response += chunk['messages'][-1].content
splits = response.split("\n")

with open("../reports/single_deep_agent_report.txt", "w") as f:
    for split in splits:
        f.write(split + "\n")


How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?
Tool Calls:
  alita_documentation (67d689e8-d5ee-469b-9960-96679abd62eb)
 Call ID: 67d689e8-d5ee-469b-9960-96679abd62eb
  Args:
    query: Alita enable agents to create and deploy MCP tools if the tool does not exist
Name: alita_documentation

<think>
Okay, let's tackle this query. The user is asking whether Alita enables agents to create and deploy MCP tools if the tool doesn't exist, and they want a detailed and exhaustive answer based on the provided context.

First, I need to recall the context information. The paper describes Alita as a generalist agent that uses MCPs (Model Context Protocols) to dynamically generate tools. The key points from the context are:

1. **MCP Creation**: Alita has a component called MCP Brainstorming that assesses the agent's capabilities and identifies gaps. If the existing tools can't handle a task, it suggests generating new tools. The Script

## Running with sub agents

In [12]:
from prompts.multi_agent_prompt import (
    crew_instructions,
    alita_mcp_zero_prompt,
    report_improvement_prompt
)

In [8]:
from langchain_core.tools import tool

@tool
def write_report_to_file(report: str):
    """Use this tool to write the final report to a file!"""
    
    splits = report.split("\n")
    with open("../reports/multi_agent_report.txt", "w") as f:
        for split in splits:
            f.write(split + "\n")
    return "Report written!"

@tool
def enhance_report(report: str):
    """Use this tool to enhance writing of the final report"""
    
    messages = [
        SystemMessage(content=report_improvement_prompt),
        HumanMessage(content=f"Report: {report}")
    ]
    response = model.invoke(messages)
    
    return response.content

In [9]:
tools_by_name = {}
for tool_ in tools:
    tools_by_name[tool_.name] = tool_

alita_mcp_zero_sub_agent = {
    "name": "alita-mcp-zero-agent",
    "description": "Used to answer questions about MCP Zero and Alita.",
    "prompt": alita_mcp_zero_prompt,
    "tools": tools_by_name,
    "model_settings": {
        "model": "ollama:qwen3:latest",
        "temperature":0,
    } 
}

In [10]:
tools_final = tools + [enhance_report, write_report_to_file]

In [11]:
report_enhancement_sub_agent = {
    "name": "report-enhancement-agent",
    "description": "Used to enhance the writing of the final report and write report to a file.",
    "prompt": report_improvement_prompt,
    "tools": {
        "write_report_to_file": write_report_to_file,
        "enhance_report": enhance_report},
    "model_settings": {
        "model": "ollama:qwen3:latest",
        "temperature":0,
    } 
}

In [13]:
agent = create_deep_agent(
    model = model,
    tools = tools_final,
    instructions = crew_instructions,
    subagents = [alita_mcp_zero_sub_agent, report_enhancement_sub_agent],
).with_config({"recursion_limit": 1000})

In [14]:
response = ""

async for chunk in agent.astream(
    {
        "messages": [
            {"role": "user", "content": "How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?"}
        ]
    },
    stream_mode="values"
):
    if "messages" in chunk:
        chunk['messages'][-1].pretty_print()
        
response += chunk['messages'][-1].content
splits = response.split("\n")

with open("../reports/multi_agent_report.txt", "w") as f:
    for split in splits:
        f.write(split + "\n")


How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?
Tool Calls:
  alita_documentation (fe98deba-756c-4918-bbd0-b3146cd2c105)
 Call ID: fe98deba-756c-4918-bbd0-b3146cd2c105
  Args:
    query: Alita agent tool creation process when tool does not exist
Name: alita_documentation

<think>
Okay, let's tackle this query about Alita's tool creation process when a tool doesn't exist. The user wants a detailed and exhaustive answer based on the provided context. First, I need to recall the relevant sections from the context.

Looking at the context, there's a section on "Tool Usage" in section 3.3 and 3.4, especially the MCP Creation Component. The key parts are the MCP Brainstorming, ScriptGeneratingTool, and CodeRunningTool. Also, the Manager Agent's role is mentioned in section 3.2. 

So, the process starts with the Manager Agent identifying a need for a new tool. The MCP Brainstorming step is crucial here. It assesses the current capa

### LLM as a judge

In [7]:
from pydantic import BaseModel, Field

class ResultCard(BaseModel):
    score: float = Field(..., description="Score from 1-10 where 10 is the highest possible score.")
    reason_for_score: str = Field(..., description="The reason that this score was given. Share why the result didn't score 10, or what went well and didn't.")
    
reports = os.listdir("../reports")

In [12]:
from tqdm.notebook import tqdm
from collections import defaultdict

# model_with_structure = model.with_structured_output(ResultCard)
model_with_tools = model.bind_tools([ResultCard])
requirements = baseline_instructions
res = defaultdict(dict)
query = "How does Alita enable agents to create and deploy MCP tools if the agent deems that the tool does not exist?"

for report in tqdm(reports):
    report_name=report.split(".txt")[0]
    with open(os.path.join("../reports", report), "r") as file:
        text = file.read()
        result = model_with_tools.invoke(
            f"""
            You are a teacher grading 3 reports.
            
            This is the report's focus: {query}
            
            These are the requirements for the report: {requirements}
            
            This is the final report: {text}
            
            Grade the final report on a score of 1-10 and give reasons for your score. Your score should
            include the report's structure, the report's depth and how well the report is written.
            Be critical but fair. Share why the result didn't score 10, or what went well and didn't.
            
            Use the tool to return the final result.
            """
        )
        # res[report_name] = result
        res['score'][report_name] = result.tool_calls[-1]['args']['score']
        res['reason'][report_name] = result.tool_calls[-1]['args']['reason_for_score']

  0%|          | 0/3 [00:00<?, ?it/s]

In [13]:
import pandas as pd

df = pd.DataFrame(res)

In [14]:
df

Unnamed: 0,score,reason
single_deep_agent_report,9,The report is well‑structured with clear headi...
multi_agent_report,8,The report is well‑structured with clear headi...
react_report,8,The report is well‑structured with a clear tit...


In [18]:
from IPython.display import display, Markdown

display(Markdown(df['reason'].iloc[0]))

The report is well‑structured with clear headings and bullet points, and it covers all key stages of Alita’s on‑demand MCP tool creation pipeline: gap detection, code synthesis, sandboxed validation, environment isolation, and registration in the MCP Box. It provides concrete examples (e.g., subtitle extraction) and empirical results that demonstrate effectiveness. The language is concise, professional, and free of self‑referential remarks, meeting the brief’s requirements. Minor improvements could include a slightly deeper explanation of the criteria used by the MCP Brainstorming component to flag a missing tool and a brief mention of how the agent handles failures beyond simple refinement. These small gaps prevent a perfect score, but the report is otherwise comprehensive and well‑written.

In [19]:
display(Markdown(df['reason'].iloc[1]))

The report is well‑structured with clear headings and a logical flow from problem definition to tool creation and validation. It covers all required phases—gap detection, resource search, script generation, execution, packaging, and refinement—providing concrete examples and code snippets. The language is concise yet thorough, and the use of bullet points and code blocks enhances readability. However, it lacks a brief introduction and conclusion, which would frame the context and summarize key takeaways. Additionally, while the report references specific tools (e.g., GoogleSearchTool, GithubSearchTool), it does not cite sources or provide links to documentation, limiting verifiability. The depth is appropriate for a technical audience, but some sections could benefit from more detailed error‑handling scenarios and performance considerations. Overall, the report meets the brief but falls short of a perfect score due to minor omissions and lack of source citations.

In [20]:
display(Markdown(df['reason'].iloc[2]))

The report is well‑structured with a clear title and numbered sections that cover the entire lifecycle of tool creation in Alita. It demonstrates depth by discussing brainstorming, script generation, validation, environment management, integration, scalability, empirical results, and limitations. The writing is professional, concise, and free of self‑reference, meeting the style guidelines. However, it falls short of a perfect score because it lacks explicit citations or references to the Alita and MCP Zero documentation, does not fully elaborate on the decision‑making process when a tool is deemed nonexistent, and could benefit from more concrete examples or code snippets. Additionally, the use of bullet points is minimal, and some sections could be broken into subsections for greater clarity. These omissions prevent it from achieving a 10/10.