# Building Multi-Agent AI Systems: From Concept to Implementation

## Introduction

AI systems are becoming increasingly complex, tackling problems that require multiple
specialized capabilities working in concert. In this tutorial, we'll build a multi-agent
system where specialized AI agents collaborate to accomplish complex tasks.

We'll explore:
1. Why multi-agent architectures are the future
2. The challenges of building multi-agent systems
3. How to implement a multi-agent system using mahilo
4. Building a complex multi-agent system step-by-step

The example we will showcase is a multi-agent system for filmmaking, that won the first prize in the [ElevenLabs Worldwide Hackathon](https://hackathon.elevenlabs.io/) for the Bengaluru track. Learn more about the project on our [Devpost submission page](https://devpost.com/software/879504/joins/ZK96p6ArfUi97mxjzxwkYQ).

## Why Multi Agent Systems?



Multi-agent systems offer several key advantages:

1. **Specialization**: Different agents can focus on specific tasks, leading to better
   performance in each domain - just like specialists in a team.

   ![](https://miro.medium.com/v2/resize:fit:640/format:webp/0*zf7GL-Dz8JAOB7tf.gif)

2. **Task Decomposition**: Complex problems become manageable when broken down into
   smaller sub-problems assigned to specialized agents.

3. **Parallel Processing**: Multiple agents can work simultaneously on different aspects
   of a problem, increasing efficiency.

4. **Better Accuracy**: Increasing the size of the toolset for a given agent leads to a dip in accuracy as shown by the ReWOO paper.

   ![](https://miro.medium.com/v2/resize:fit:640/format:webp/1*GvD5uvHY0mOPrmpp9Oh6-A.png)

## The Challenges of Building Multi-Agent Systems

Despite their benefits, building effective multi-agent systems presents challenges:

### 1. Communication Complexity

Enabling effective communication between agents requires:

- Designing protocols for agents to request information from each other
- Ensuring messages contain appropriate context
- Tracking conversations and maintaining state across interactions
- Preventing infinite loops and circular references

### 2. Observability Challenges

As your agent system grows, understanding what's happening becomes difficult:

- Tracing requests across multiple agents
- Identifying which agent might be causing issues
- Collecting metrics on agent performance
- Debugging failures in a distributed system

### 3. Human-in-the-Loop Complexity

Integrating human oversight adds another layer of complexity:

- Creating interfaces for humans to monitor agent activities
- Designing effective handoff protocols between AI and humans
- Determining when to escalate decisions to human operators

## mahilo: A Lightweight Control Plane for Multi-Agent Systems

Mahilo is a framework designed to address these challenges while remaining lightweight
and flexible. The following is an example of a multi-agent system in mahilo that showcases how agents within mahilo can communicate automatically while still being supervised by a human.

![mahilo_arch](../../assets/mahilo.png)

To understand how mahilo works, let's start by building a simple multi-agent system.
We'll create a team of two agents:

1. A **Research Agent** that can find information
2. A **Writing Agent** that can create content based on that information

First, let's define our research agent. The process for the Writing Agent will be similar and we will not show it for brevity.

### Setup
Before, we begin, let's install the mahilo package.

In [None]:
!pip install mahilo

Add an OpenAI API key to the environment variables.

In [1]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

Now let's create a basic multi-agent system with these agents, with no tools. Later on, in the Filmmaker project, we'll look at a complete implementation with tools and workflows.

In [7]:
from mahilo import BaseAgent, AgentManager, ServerManager

def setup_simple_system() -> ServerManager:
    # Create the agents
    research_agent = BaseAgent(
        name="ResearchAgent",
        type="research_agent",
        description="You are a research agent",
        short_description="A research agent",
        tools=[],
    )
    
    writing_agent = BaseAgent(
        name="WritingAgent",
        type="writing_agent",
        description="You are a writing agent",
        short_description="A writing agent",
        tools=[],
        can_contact=["research_agent"]  # Specify which agents this agent can contact
    )
    
    # Create an agent manager to manage our team
    team = AgentManager()
    
    # Register the agents with the manager
    team.register_agent(research_agent)
    team.register_agent(writing_agent)
    
    # Activate the agents
    research_agent.activate()
    writing_agent.activate()
    
    # Create a server manager to handle agent communication
    server = ServerManager(team)
    
    return server

In [None]:
import asyncio
import nest_asyncio

server = setup_simple_system()

# run the server
nest_asyncio.apply()
asyncio.run(server.run())

You can now connect to the server using the mahilo CLI in your terminal.This will open a chat where you can talk to the research agent about your task. The agent can then process your request and reach out to the writing agent as needed to complete the task.

```bash
mahilo connect --agent-name ResearchAgent
```

What's important to notice here
1. **Automatic Communication**: The writing agent can automatically communicate internally with the
   research agent thanks to mahilo's communication layer.

2. **Clear Separation of Concerns**: Each agent has a specific role and set of capabilities,
   making the system modular and maintainable.

3. **Simple Setup**: Despite being a multi-agent system, the setup code is straightforward
   and easy to understand.

4. **Minimal Boilerplate**: Mahilo handles all the communication infrastructure, so we can
   focus on defining the agents' capabilities and behavior.

In a real application, the server would be running, and you'd interact with it through API
calls or a web interface. Each agent would have more sophisticated tools, and the communication
would happen automatically as needed.


### Built for complex applications

In addition to the features above, mahilo boasts other features that make
it a powerful tool for building multi-agent systems.

1. Built-in Observability

    Mahilo provides out-of-the-box metrics, traces (in OpenTelemetry format), and logs of all
inter-agent communications, making it easy to understand, debug, and optimize your system.
    #  <div style="display: flex; flex-direction: row; justify-content: space-between; gap: 10px;">
    #    <img src="../../assets/metrics.png" alt="metrics" style="max-width: 48%;" />
    #    <img src="../../assets/traces.png" alt="traces" style="max-width: 48%;" />
    #  </div>

2. Inter-Agent Communication Protocol

    All agents communicate using a simple protocol that has features like retries, acknowledgement, response validation and more. In addition, all messages are tracked and available at the server's `/messages` endpoint.

3. Human-in-the-Loop First

    Unlike many frameworks where human oversight is an afterthought, mahilo treats humans as
first-class participants. Any agent can have a human counterpart who can monitor, intervene,
and guide the agent as needed.

3. Framework Agnostic

    Mahilo is designed to work with agents from any framework, whether they're built with
LangChain, PydanticAI, or custom implementations. This makes it easy to 
incorporate existing agents or specialized tools into your system.

Watch a demo of a mahilo system in action:

[![](../../assets/mahilo_tut_simple.jpg)](https://x.com/wjayesh/status/1872263352254427458)


Now that we understand the basics, let's move on to a more complex real-world example:
a multi-agent system for filmmaking.

## The Hitchcock Project: A Multi-Agent Filmmaker

Let's examine a more complex example: a multi-agent system for filmmaking
that won the first prize in the [ElevenLabs Worldwide Hackathon](https://hackathon.elevenlabs.io/) for the Bengaluru track. This system has multiple agents that work together to output a film in the end. Here, I'll explain the thought process behind building such a system, and how I went about designing and executing it.

### Step 1: Planning the System Architecture

Before writing any code, it's crucial to plan the overall architecture of your multi-agent system.
For our filmmaker project, we started by:

1. **Identifying the Core Processes**: Analyzing the filmmaking process to identify distinct stages
   that could benefit from specialization

2. **Defining Agent Boundaries**: Determining clear responsibilities for each agent to avoid overlap
   and ensure clear ownership of tasks

3. **Mapping Communication Flows**: Deciding which agents need to communicate with each other
   and what information they need to exchange

4. **Designing Workflow Sequences**: Establishing the correct order of operations to produce
   a coherent final product

![hithcock_arch](../../assets/hitchcock_arch.jpg)

Our analysis led us to a system with four specialized agents:

1. **Script Writer Agent**: Creates the initial movie script based on a prompt
2. **Story Boarder Agent**: Breaks down the script into visual scenes and shots
3. **Director of Photography (DOP) Agent**: Generates images based on storyboard specifications
4. **Audio Agent**: Creates audio content matching the visual scenes

This architecture ensures that each stage of film production has a dedicated expert agent,
with clear workflows between them.

### Step 2: Designing Agent Capabilities

For each agent, we needed to:

1. **Define Core Competencies**: What specific tasks can this agent perform?
2. **Design Appropriate Tools**: What functions should the agent have access to?
3. **Craft Effective Prompts**: How should we instruct the agent to use these tools?
4. **Establish Communication Protocols**: When and how should this agent interact with others?

Let's look at how we designed the Script Writer agent:

In [None]:
# First, we defined the prompt that establishes the agent's role and behavior
script_writer_prompt = """
You are a creative story writer agent specialized in creating movie scripts. You have access to powerful research tools:

1. Web Search and Browsing:
   - search_information: Search the internet for information
   - visit: Visit and read web pages
   - page_up/page_down: Navigate through long content
   - find/find_next: Search within pages
   - search_archives: Search historical archives

2. Text Analysis:
   - inspect_file_as_text: Analyze and extract information from text content

When given a writing task:
1. First research the historical and cultural context of your setting
2. Study similar movies and their themes for inspiration
3. Use the research to create authentic, well-researched scripts
4. Pay special attention to:
   - Dialogue matching the era and location
   - Historically accurate scene descriptions
   - Cultural authenticity
   - Period-appropriate details

Once you're done with the script generation, inform the story boarder agent to create a storyboard. Always do that, and you can
use the chat_with_agent tool for it.

Wait for specific requests before taking action. Use your research tools to gather information before writing.
"""

# Then, a concise description that helps other agents understand its role
script_writer_short_description = """
Story writer agent for creating movie scripts with deep research capabilities
"""

Now, we implemented key tools that the agent would need to fulfill its role.

> Note: This is a simplified version of the actual implementation

In [None]:
def get_script_with_research(script_prompt) -> str:
    """
    Generate a movie script based on the given prompt with research capabilities.
    
    Args:
        script_prompt (str): Description of the script to generate
        
    Returns:
        str: Generated script text
    """
    # In the real implementation, this would:
    # 1. Use a research agent to gather relevant information
    # 2. Generate a full script using an LLM
    # 3. Save the script to a shared location for other agents to access
    # 4. Extract character information for the storyboard agent
    
    # For demonstration purposes, we'll just return a placeholder
    return f"Generated script based on: {script_prompt}"

The next step is to package the tool in mahilo's expected format. This is just the OpenAI tool definition spec with an additional function field.

In [None]:
# Package the tool in mahilo's expected format
script_writer_tools = [
    {
        "tool": {
            "type": "function",
            "function": {
                "name": "get_script_with_research",
                "description": "Generate a movie script with research capabilities",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "script_prompt": {
                            "type": "string",
                            "description": "Description of the script to generate (e.g., 'Write a thriller set in 1920s Chicago')"
                        }
                    },
                    "required": ["script_prompt"]
                }
            }
        },
        "function": get_script_with_research,
    }
]

### Step 3: Building Data Flow Mechanisms
A critical aspect of multi-agent systems is ensuring data flows correctly between agents.
For our filmmaker project, we implemented several data sharing mechanisms:

1. **File-Based Sharing**: Scripts, storyboards, and generated images are saved to disk
   where other agents can access them

2. **Database Integration**: A shared database stores structured information like character
   details, scene breakdowns, and shot specifications

3. **Direct Communication**: Agents can request specific information from each other
   using mahilo's internal communication protocol.

For example, here's how our Story Boarder agent uses these mechanisms:
1. It reads the script from a file saved by the Script Writer
2. It processes the script and saves structured scene data to a database
3. It notifies the DOP agent when storyboard specifications are ready

This approach ensures that each agent has access to the exact information it needs,
in the most appropriate format for its tasks. With mahilo, to make an agent talk to another agent, all you need to do is simply mention this in the prompt.

Let's look at how the Story Boarder agent is programmed to drive the workflow forward:

In [None]:
story_boarder_prompt = """
You are a professional storyboard artist agent specialized in breaking down scripts into visual elements. Your role involves two main phases:

1. Initial Storyboard Creation (when receiving from Script Writer Agent):
   a. Use plan_storyboard_scenes to break down script into scenes and mark their importance
   b. Use analyze_script_scenes to process critical/high importance scenes and plan key shots
   c. Use plan_visual_elements to define lighting, props, atmosphere, and effects
   d. Use create_shot_image_specs to compile detailed specifications for the DOP agent

   After completion, notify the DOP agent that the storyboard is ready.
   Then, notify the Audio agent that the storyboard is ready and it can generate audio for the script.

2. Image Review Phase (when receiving from DOP Agent):
   - Use critique_generated_images to review and provide feedback on generated images
   - Specify any needed adjustments

Focus on:
- Critical story-driving scenes
- Clear, practical shot descriptions
- Technical feasibility
- Visual consistency
- Complete information for DOP

Only use tools when explicitly asked to analyze/design, or when DOP requests image review.
"""

story_boarder_short_description = """
Storyboard artist agent for breaking down scripts into visual elements
"""

The prompt explicitly directs the agent to notify the next agents in the workflow,
ensuring that the process moves forward automatically.

### Step 4: Building the Control Plane

With all our agents and workflows designed, the final step is creating the control plane
that brings everything together. In mahilo, this is remarkably straightforward:

In [None]:
def setup_filmmaker_system():
    # Create the Script Writer agent
    script_writer = BaseAgent(
        name="ScriptWriterAgent",
        type="script_writer",
        description=script_writer_prompt,
        short_description=script_writer_short_description,
        tools=script_writer_tools,
    )

    # Create the Story Boarder agent
    story_boarder = BaseAgent(
        name="StoryBoarderAgent",
        type="story_boarder",
        description=story_boarder_prompt,
        short_description=story_boarder_short_description,
        tools=[], # simplified for the tutorial
    )

    # Create the DOP agent (simplified for the tutorial)
    dop = BaseAgent(
        name="DOPAgent",
        type="dop",
        description="Director of Photography agent specialized in generating images",
        short_description="Creates images based on storyboard specifications",
        tools=[],  # simplified for the tutorial
    )

    # Create the Audio agent (simplified for the tutorial)
    audio = BaseAgent(
        name="AudioAgent",
        type="audio",
        description="Audio agent specialized in creating sound content",
        short_description="Creates audio content matching the visual scenes",
        tools=[],  # simplified for the tutorial
    )

    # Create an agent manager to manage our team
    team = AgentManager()
    
    # Register all agents with the manager
    team.register_agent(script_writer)
    team.register_agent(story_boarder)
    team.register_agent(dop)
    team.register_agent(audio)
    
    # Activate the agent you want to start chat with
    script_writer.activate()
    
    # Create a server manager to handle agent communication
    server = ServerManager(team)
    
    return server

### Step 5: Running the System

You can run the control plane by calling the `run` method on the server object.

In [None]:
import asyncio
import nest_asyncio

server = setup_filmmaker_system()

# run the server
nest_asyncio.apply()
asyncio.run(server.run())

Similar to the simple system, you can connect to the server using the mahilo CLI in your terminal.

```bash
mahilo connect --agent-name ScriptWriterAgent
```



### Hithcock Demo

The following is a demo of the filmmaker system in action.

[![Hitchcock Demo](https://img.youtube.com/vi/O0bswr-46kg/0.jpg)](https://www.youtube.com/watch?v=O0bswr-46kg)

## Best Practices for building multi-agent apps

Based on our experience developing the filmmaker system, here are some best practices:

1. Clear Agent Boundaries

    Define precise, non-overlapping responsibilities for each agent.

    *Why it matters*: Clear boundaries prevent confusion, reduce redundant work, and make
the system easier to debug and maintain.

2. Thoughtful Prompt Engineering

    Craft detailed prompts that guide agent behavior and interaction patterns.

    *Why it matters*: Well-designed prompts serve as both documentation and runtime
instructions, ensuring agents behave as expected.

3. Explicit Communication Guideliness

    Establish when and how agents should communicate, including what information
they should share.

    *Why it matters*: Without protocols, agents may overwhelm each other with irrelevant
information or fail to share critical data.

4. Workflow Orchestration

    Implement clear trigger patterns for moving from one stage to the next.

    *Why it matters*: Without orchestration, multi-agent systems can stall or produce
inconsistent results.

5. Testing Agent Interactions

    Test agent pairs in isolation before integrating them into the full system.

    *Why it matters*: Isolating agent interactions makes it easier to identify and fix
communication issues.

6. Human Oversight Design

    Tell the agents in their prompt about specific checkpoints where humans should review and intervene in the process.

    *Why it matters*: Human oversight improves quality and provides a safety net for
handling edge cases and creative decisions.

## Conclusion

In this tutorial, we've explored:

1. Why multi-agent architectures are essential for complex AI applications
2. The challenges of building multi-agent systems from scratch
3. How mahilo simplifies multi-agent development
4. A simple example demonstrating mahilo's core capabilities
5. A real-world case study of a prize-winning filmmaker system
6. Best practices for developing effective multi-agent systems

Multi-agent systems represent the future of AI application development, enabling more
sophisticated, reliable, and maintainable solutions than single-agent approaches.
With frameworks like mahilo, these powerful architectures become accessible to developers
without requiring extensive expertise.

By focusing on clear agent design, thoughtful communication patterns, and effective
workflow orchestration, you can build multi-agent systems that tackle complex problems
through the collaboration of specialized agents.

We have a community where you can share your experiences and learn from others, feel free to join us:
https://github.com/wjayesh/mahilo/discussions