In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
query = "What is Alita? How does it work?"

## Ingesting documents into Llama Cloud

In [3]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex
from dotenv import load_dotenv, find_dotenv
import os

_ = load_dotenv(find_dotenv())  # read local .env file

alita_index = LlamaCloudIndex(
  name="alita-index",
  project_name="Default",
  organization_id="bf9b425c-54cb-4182-a93f-8ac6aed04348",
  api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),
)

nodes = alita_index.as_retriever().retrieve(query)

In [4]:
nodes

[NodeWithScore(node=TextNode(id_='2c753523-8829-4ad7-b85e-b648f422f5ae', embedding=None, metadata={'id': 'alita_paper.pdf', 'file_size': 1113373, 'last_modified_at': '2025-08-24T05:14:27', 'file_path': 'alita_paper.pdf', 'file_name': 'alita_paper.pdf', 'external_file_id': 'alita_paper.pdf', 'file_id': 'bcee7eb9-6479-46a6-9122-fd5b454699ef', 'pipeline_file_id': 'd149ed00-0d38-4815-a641-d388a45063aa', 'pipeline_id': 'e4c3c794-6d8a-450a-b3b3-2d1dee470c2e', 'page_label': 1, 'start_page_index': 0, 'start_page_label': 1, 'end_page_index': 0, 'end_page_label': 1, 'document_id': 'c9f76edb0a0d2f097f23ba53edef10f99a6924dfccb183a929', 'start_char_idx': 1, 'end_char_idx': 2846}, excluded_embed_metadata_keys=['file_id', 'pipeline_file_id', 'id', 'file_size', 'last_modified_at', 'file_id', 'pipeline_file_id', 'id', 'file_size', 'last_modified_at', 'start_page_index', 'start_page_label', 'page_label', 'end_page_index', 'end_page_label', 'document_id', 'file_id', 'pipeline_file_id'], excluded_llm_meta

In [5]:
from llama_index.llms.openai_like import OpenAILike
from IPython.display import display, Markdown

llm = OpenAILike(
    model="Qwen/Qwen3-8B",
    api_key=os.getenv("BENTO_CLOUD_API_KEY"),
    api_base=f'{os.getenv("qwen3_endpoint_url")}/v1',
    is_chat_model=True,
    is_function_calling_model=True,
    temperature=0,
    timeout=600,
)
alita_query_engine = alita_index.as_query_engine(llm=llm)
response = alita_query_engine.query(query)

display(Markdown(f"**Response:** {response}"))

**Response:** 

Alita is a generalist agent designed to enable scalable agentic reasoning through minimal reliance on predefined tools and workflows, prioritizing simplicity and autonomous capability development. It functions by utilizing a Manager Agent to orchestrate task execution, which leverages a Web Agent to identify and dynamically generate task-specific tools. These tools are systematically encapsulated as Model Context Protocols (MCPs), creating reusable components that can be adapted for diverse applications. The process involves iterative refinement of these MCPs, allowing Alita to autonomously expand its functional scope while ensuring compatibility across different environments. This approach minimizes manual intervention and enhances adaptability, enabling efficient problem-solving across complex tasks.

In [6]:
mcp_zero_index = LlamaCloudIndex(
  name="mcp-zero-index",
  project_name="Default",
  organization_id="bf9b425c-54cb-4182-a93f-8ac6aed04348",
  api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),
)

In [7]:
mcp_zero_engine = mcp_zero_index.as_query_engine(llm=llm)

In [8]:
%%time
response = mcp_zero_engine.query("What is MCP zero and how does it work?")

CPU times: user 82 ms, sys: 5.1 ms, total: 87.1 ms
Wall time: 2min 10s


In [9]:
display(Markdown(f"**Response:** {response}"))

**Response:** 

MCP-Zero is a framework designed to enhance the autonomy of large language models (LLMs) by enabling them to dynamically identify and request tools based on task requirements, rather than relying on pre-defined schemas. This approach reduces context overhead and promotes active decision-making through a structured, iterative process.  

The framework operates through three core mechanisms:  
1. **Active Tool Request**: LLMs are prompted with system instructions to explicitly declare missing capabilities, such as emitting a `<tool_assistant>` block specifying required server domains and tool operations. This ensures alignment between task needs and available resources.  
2. **Hierarchical Semantic Routing**: A lightweight tool index, curated with metadata and vectorized embeddings (e.g., using text-embedding-3-large), enables precise matching of tool descriptions to model-generated requests. This involves filtering candidate servers and ranking tools by semantic similarity to optimize retrieval efficiency.  
3. **Iterative Capability Extension**: During task execution, agents dynamically refine tool selections by leveraging retrieved JSON-schemas, adapting to incomplete or insufficient initial choices. This allows for adaptive, fault-tolerant problem-solving across domains.  

By integrating these steps, MCP-Zero minimizes context overhead, improves scalability, and maintains high accuracy in multi-turn interactions, empowering LLMs to autonomously shape their operational environment.

## Composite retrieval
Not recommended. It's better to break the question into sub parts and query the correct index with each part.

In [10]:
from llama_cloud import CompositeRetrievalMode
from llama_index.indices.managed.llama_cloud import (
    LlamaCloudCompositeRetriever,
)

retriever = LlamaCloudCompositeRetriever(
    name="Alita and MCP Zero Retriever",
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),
    create_if_not_exists=True,
    mode=CompositeRetrievalMode.FULL,
    rerank_top_n=6,
)
retriever.add_index(
    alita_index, description="Knowledge base for the Alita paradigm for agents"
)
retriever.add_index(
    mcp_zero_index, description="Knowledge base of the (model context protocol) MCP zero paradigm"
)

Retriever(name='Alita and MCP Zero Retriever', pipelines=[RetrieverPipeline(name='alita-index', description='Knowledge base for the Alita paradigm for agents', pipeline_id='e4c3c794-6d8a-450a-b3b3-2d1dee470c2e', preset_retrieval_parameters=PresetRetrievalParams(dense_similarity_top_k=30, dense_similarity_cutoff=0.0, sparse_similarity_top_k=30, enable_reranking=True, rerank_top_n=6, alpha=0.5, search_filters=None, search_filters_inference_schema=None, files_top_k=1, retrieval_mode=<RetrievalMode.CHUNKS: 'chunks'>, retrieve_image_nodes=False, retrieve_page_screenshot_nodes=False, retrieve_page_figure_nodes=False, class_name='base_component')), RetrieverPipeline(name='mcp-zero-index', description='Knowledge base of the (model context protocol) MCP zero paradigm', pipeline_id='a8f66184-b5e1-487d-a65f-3a78c5b4ae1d', preset_retrieval_parameters=PresetRetrievalParams(dense_similarity_top_k=30, dense_similarity_cutoff=0.0, sparse_similarity_top_k=30, enable_reranking=True, rerank_top_n=6, alph

In [11]:
nodes = retriever.retrieve(
    "What is Alita and what is MCP Zero? Can Alita and MCP zero work together?"
)
nodes

[NodeWithScore(node=TextNode(id_='5d413f8a-3005-45cc-b098-42fd1d4d1179', embedding=None, metadata={'id': 'mcp_zero_paper.pdf', 'file_size': 975244, 'last_modified_at': '2025-08-24T05:20:20', 'file_path': 'mcp_zero_paper.pdf', 'file_name': 'mcp_zero_paper.pdf', 'external_file_id': 'mcp_zero_paper.pdf', 'file_id': '99275b01-66e2-4f31-9b1c-914c3916ba5d', 'pipeline_file_id': '7e975483-54a7-4e46-ac11-74045fc733ee', 'pipeline_id': 'a8f66184-b5e1-487d-a65f-3a78c5b4ae1d', 'page_label': 10, 'start_page_index': 9, 'start_page_label': 10, 'end_page_index': 9, 'end_page_label': 10, 'document_id': '2e13a37a379d317a1af0a61b9d831caed3d655a04d32d55bc9', 'start_char_idx': 89466, 'end_char_idx': 93609, 'retriever_id': 'd2c4add1-f4cb-41a0-bcfd-58dac5293769', 'retriever_pipeline_name': 'mcp-zero-index'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='2. **Semantic grounding.** The example also clarifies t

In [12]:
%%time
from llama_index.core import get_response_synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine

response_synthesizer = get_response_synthesizer(llm=llm)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

response = query_engine.query(
    "What is Alita and what is MCP Zero? Can Alita and MCP zero work together?"
)

CPU times: user 108 ms, sys: 7.47 ms, total: 116 ms
Wall time: 2min 58s


This query took 3 minutes.

In [13]:
display(Markdown(f"**Response:** {response}"))

**Response:** 

Alita is a generalist agent designed for scalable agentic reasoning, capable of automatically generating tools and workflows to solve diverse tasks without relying on complex predefined structures. It excels at creating new tools on-the-fly when existing ones are insufficient, enhancing its adaptability. MCP-Zero focuses on efficiently discovering and invoking pre-existing tools through semantic alignment and active requests, optimizing tool retrieval accuracy and reducing context usage.  

Alita and MCP-Zero can work synergistically. MCP-Zero's strength in locating existing tools complements Alita's ability to synthesize new tools when needed. Together, they form a self-reinforcing loop: MCP-Zero prioritizes tool discovery, while Alita handles tool creation for gaps, with the newly built tools being registered for future use. This integration enables a more robust, self-evolving system that balances efficiency and adaptability in complex tasks.

In [14]:
response = query_engine.query(
    "How can I use Alita and MCP Zero in an agent? Are they separate tools?"
)
display(Markdown(f"**Response:** {response}"))

**Response:** 

Alita and MCP-Zero are complementary components designed to work together in an agent system. MCP-Zero focuses on efficiently discovering and invoking existing tools by actively requesting them through semantically aligned queries, while Alita specializes in automatically building tools from scratch when no suitable existing tool is found. They are not separate tools but rather two sides of the same problem: MCP-Zero maximizes tool discovery, and Alita maximizes tool creation.  

To integrate them into an agent:  
1. **Prompt the model** to request tools explicitly (e.g., using a structured format like `<tool_assistant>`).  
2. **Curate a lightweight tool index** with semantic descriptions for existing tools.  
3. **Match the model's requests** against the tool index for retrieval. If no match is found, trigger Alita's workflow to synthesize a new tool.  
4. **Register newly created tools** back into the shared pool for future use.  

This integration creates a self-evolving loop where MCP-Zero handles existing tools, and Alita fills gaps, enabling agents to scale effectively.