# ðŸ““ The GenAI Revolution Cookbook

**Title:** Multi-Agent Systems: A Field Guide with CrewAI and LangChain

**Description:** Build fast, controllable multi-agent systems using CrewAI and LangChain with task granularity, Pydantic I/O, parallel execution, and focused agents today.

**ðŸ“– Read the full article:** [Multi-Agent Systems: A Field Guide with CrewAI and LangChain](https://blog.thegenairevolution.com/article/multi-agent-systems-a-field-guide-with-crewai-and-langchain)

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



Building multi\-agent systems with LLMs isn't like building regular software. It's actually much closer to doing data science. You form a hypothesis, test it out, tweak your setup, and repeat. Over and over again. If you're expecting your agents and tasks to work perfectly right out of the gate, well, you're going to be disappointed.

I've spent the past few months building and refining agent\-based systems, and I've learned that using CrewAI and LangChain makes a massive difference. Not just in how quickly I can build, but in how easy it is to experiment and iterate. If you're looking for a [step\-by\-step guide to building multi\-agent AI systems with CrewAI](/article/how-to-build-multi-agent-ai-systems-with-crewai-and-yaml-2), we cover reusable patterns, guardrails, and YAML\-first workflows.

This post shares the practical lessons that have actually worked for me. It's not a tutorial or a feature tour. Think of it more as a field guide. How to structure tasks, how to keep agents focused, how to speed up execution, and how to keep your system from completely falling apart as it grows. Hopefully, this saves you some time, frustration, and more than a few late nights.

## Start with Tasks, Keep Them Small

### Clearly Define and Structure Your Tasks

When you're building multi\-agent systems, you need to start by clearly defining your tasks. Writing detailed, step\-by\-step instructions helps clarify what you're trying to accomplish and what you expect from each agent. What I quickly realized is that tasks tend to become overly complex, especially when you try to do too much at once.

### Avoid Task Overload

Here's a practical challenge I kept running into. When tasks contain more than a handful of steps, even with those massive context windows we have now, your LLM\-based agents start dropping essential instructions. It's like they just forget what they're supposed to be doing halfway through. This loss of clarity is a clear signal that your tasks are trying to do too much and need to be simplified.

### Breaking Down Tasks

What works best for me is keeping tasks concise. I try to limit them to around 3 or 4 clear steps each. If I notice a task getting too complex, I split it up into smaller subtasks. For instance, instead of combining user intent classification with immediate task execution in one go, I separate these processes. It makes everything clearer and more accurate. By breaking down large tasks, you really do improve agent performance, accuracy, and manageability significantly.

### Example

In [None]:
# What not to do
god_task:
  description: >
    Analyze the user query, classify the intent, extract relevant entities,
    query the knowledge base, generate a markdown response, ask for clarification 
    if needed, log the request for analytics, and refresh the cache if the user is 
    a premium member.
  expected_output: >
    A complete markdown response to the user query, with intent classified,
    relevant entities extracted, cache refreshed (if needed), and analytics updated.
  agent: TBD

In [None]:
# What works
classify_intent:
  description: >
    Analyze the user's input and classify it into a predefined intent category
    (e.g., information request, action request, greeting). If the intent is unclear,
    ask a clarifying question before proceeding.
  expected_output: >
    A JSON object with the intent category and any follow-up question if clarification is needed.
  agent: TBD

extract_entities:
  description: >
    Based on the classified intent, extract any relevant entities from the 
    user's input. These may include product names, locations and dates.
  expected_output: >
    A JSON object containing the extracted entities as key-value pairs.
  agent: TBD

retrieve_and_respond:
  description: >
    Using the classified intent and extracted entities, search the appropriate 
    data sources and generate a markdown-formatted response that directly 
    answers the user query.
  expected_output: >
    A well-formatted markdown answer that is accurate and relevant to the user query.
  agent: TBD

log_and_refresh:
  description: >
    If the user is a premium member, log the query metadata to the analytics system
    and refresh the corresponding cache entries. This task is optional and should 
    run independently.
  expected_output: >
    A status report indicating whether analytics were logged and cache was refreshed.
  agent: TBD

## Use Pydantic Models to Control Inputs and Outputs

### The Role of Structured Data in Multi\-Agent Systems

Once you've got your tasks defined and agents focused, the next challenge is maintaining consistent and reliable communication between them. This is where structured input and output becomes absolutely crucial. Without clearly defined data formats, information gets lost, misinterpreted, or becomes too ambiguous for the next agent in the chain to handle properly. I learned this the hard way.

### Why Pydantic Makes a Difference

Using Pydantic models helps enforce what I like to think of as a shared contract between tasks. For more on [best practices for prompt engineering and reliable LLM outputs](/article/prompt-engineering-with-llm-apis-how-to-get-reliable-outputs-3), including how to structure prompts and enforce output formats, check out our in\-depth guide. These models basically act as schemas that define exactly what an agent expects and what it will return. This is especially helpful when you're dealing with multiple agents passing information back and forth, or when integrating with external tools or APIs.

### What Has Worked for Me

I've found that defining a Pydantic model for each task's output as early as possible really pays off. It forces clarity for both the developer and the LLM, and it ensures that the flow between tasks stays smooth. If something changes in the structure, you can adjust it centrally. This approach has made debugging so much easier and significantly reduced the friction when chaining tasks together in complex workflows.

### Example

In [None]:
from pydantic import BaseModel, Field
from typing import Optional, Dict


class EntityExtractionOutput(BaseModel):
    """Extracted entities from the user's input."""
    product: Optional[str] = Field(None, description="The name of the product")
    location: Optional[str] = Field(None, description="Any location reference input")
    date: Optional[str] = Field(None, description="Relevant date or time information")

In [None]:
# Example YAML for assigning the model to a task
# (this would be in your crewai task YAML file)
extract_entities:
  description: >
    Based on the classified intent, extract any relevant entities from 
    the user's input. These may include product names, locations and dates.
  expected_output: EntityExtractionOutput
  agent: TBD

In [None]:
# Example defining the task in your crew
extract_entities = Task(
  config=tasks_config['extract_entities'],
  output_pydantic=EntityExtractionOutput
)

## Keep Agents Focused

### Avoid the "Do\-It\-All" Agent

A common trap when you're starting with multi\-agent systems is trying to assign too many different types of tasks to a single agent. It might seem efficient at first. But agents quickly lose their effectiveness when they're overloaded with unrelated responsibilities. Just like in real teams, specialization actually matters. For a [step\-by\-step tutorial on building a specialized LLM agent](/article/how-to-build-an-llm-agent-from-scratch-with-gpt-4-react-5), including reasoning, actions, and automation, see our guide using the GPT\-4 ReAct pattern.

### Group Related Tasks

What I've found works best is giving each agent a clear role and limiting them to 3 or 4 closely related tasks. Once you've broken your tasks down into small, focused steps, look at the nature of each task. Group similar ones together and assign them to a single agent. If a task feels out of place or unrelated, it probably belongs to a different agent entirely.

### Push Shared Logic Up to the Agent

Sometimes you have certain behaviors or instructions that need to be repeated across multiple tasks. Instead of duplicating that logic in each task, I put it in the agent definition itself. For example, if all of an agent's tasks require a specific tone or reasoning style, I define that expectation once at the agent level. This keeps your task definitions cleaner and reduces inconsistencies during execution.

In [None]:
# Handles: classify_intent
intent_classifier:
  role: >
    User Intent Classification Specialist
  goal: >
    Accurately identify the user's intent to guide downstream task execution
  backstory: >
    You're an expert in natural language understanding with a strong intuition
    for interpreting human queries. Your precision in classifying intent ensures
    that the rest of the system can act with clarity and purpose. You never assumeâ€”
    if the user's intent is unclear, you ask the right follow-up question to 
    clarify it.

# Handles: extract_entities, retrieve_and_respond, log_and_refresh
retrieval_specialist:
  role: >
    Intelligent Retrieval and Response Generator
  goal: >
    Deliver precise and well-formatted responses based on user needs
  backstory: >
    You're a results-driven AI agent skilled in using structured inputs like
    classified intent and extracted metadata to retrieve accurate information.
    You're also a master of markdown formatting, ensuring your answers are always
    clean, informative, and ready to be presented to the user. You always maintain a
    helpful, professional tone, and your reasoning is structured and explicitâ€”start 
    from known facts, explain your steps clearly, and avoid skipping 
    logical connections.

    Since many of your tasks require consistent formatting and structured thinking,
    you've been designed to always follow a markdown-friendly output style,
    using bullet points, headings, and code blocks where appropriate. You prioritize
    clarity and readability across all responses, avoiding repetition and verbosity.

## Optimize Execution Speed and Flexibility

### The Speed Challenge with Multi\-Agent Systems

As you scale up your agents and tasks, one issue that quickly becomes apparent is speed. If every task waits for the previous one to complete, even when they're completely unrelated, you end up with a serious bottleneck. This can slow your system down considerably, especially when tasks could be safely executed in parallel.

### Using Asynchronous and Conditional Tasks

What has worked for me is using async\_execution\=True for tasks that can run in parallel. If your multi\-agent system leverages retrieval\-augmented generation, you might find our guide on [RAG techniques to boost answer accuracy](/article/rag-application-7-retrieval-tricks-to-boost-answer-accuracy-2) especially useful for optimizing both speed and quality. This lets the system take advantage of concurrency without compromising task logic. Tasks that perform independent lookups or data enrichment, for instance, can often be done simultaneously.

I also use context\-based chaining and conditional tasks to control the flow. Some tasks only need to run if certain outputs are present, and conditional logic makes it easy to skip unnecessary steps. One thing to remember: always end your sequence with a non\-async task to ensure everything syncs back together before final output or transition.

### Keeping Things Flexible and Fast

This approach gives you high flexibility and better performance. You're not locked into a rigid step\-by\-step structure. You can design your flows to adapt based on what's needed and still move fast. In practice, this has allowed me to scale workflows without sacrificing responsiveness or control.

### Example

In [None]:
classify_intent:
  description: >
    Analyze the user's input and classify it into a predefined intent category
    (e.g., information request, action request, greeting). If the intent is unclear,
    ask a clarifying question before proceeding.
  expected_output: IntentClassificationOutput
  agent: intent_classifier

extract_entities:
  description: >
    Based on the classified intent, extract any relevant entities from the user's 
    input. These may include product names, locations and dates.
  expected_output: EntityExtractionOutput
  agent: retrieval_specialist
  context: [classify_intent]

retrieve_and_respond:
  description: >
    Using the classified intent and extracted entities, search the appropriate data
    sources and generate a markdown-formatted response that directly answers 
    the user query.
  expected_output: MarkdownResponseOutput
  agent: retrieval_specialist
  context: [extract_entities]

# Async Task
log_and_refresh:
  description: >
    If the user is a premium member, log the query metadata to the analytics system
    and refresh the corresponding cache entries. This task is optional and should 
    run independently.
  expected_output: CacheLoggingStatus
  agent: retrieval_specialist
  async_execution: true
  context: [intent_classifier]

### Conditional Task

In [None]:
from crewai.tasks.conditional_task import ConditionalTask

# Output of the classify_intent task
class IntentClassificationOutput(BaseModel):
    intent: str
    is_premium_user: bool
        
# Define the condition function for the conditional task
def is_premium_user(output: TaskOutput) -> bool:
    return output.pydantic.is_premium_user
    
# log_and_refresh conditional task
log_and_refresh = ConditionalTask(
    config=tasks_config['log_and_refresh '],
    output_pydantic=CacheLoggingStatus,
    condition=is_premium_user
)

## Conclusion

Multi\-agent systems are powerful, but only if they're structured right. The more complex your system gets, the more those small mistakes, like overloading agents or writing unclear tasks, start to compound. For additional guidance on [how to structure system and user prompts to avoid conflicts](/article/system-prompt-vs-user-prompt-how-to-keep-models-from-ignoring-your-rules) and ensure clarity, see our analysis of prompt hierarchies.

What has worked best for me is keeping things simple, modular, and easy to reason about. Three to four steps per task, three to four tasks per agent, structured I/O between them, and a framework that lets me adapt quickly when things don't go as planned.

Frameworks like CrewAI and LangChain give you a solid foundation. But the design decisions, how you write your tasks, how you assign your agents, and how you handle execution, that's where the real success or failure happens.

If you're just starting out, expect to iterate. Expect to refactor. But also know that once the pieces fall into place, multi\-agent workflows can be incredibly powerful, flexible, and fast to maintain. Hopefully, the patterns I've shared here help you get there a little quicker.