[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/parallelization-exercise.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239934-lesson-1-parallelization)

# Parallelization Exercises

## Learning Objectives

By completing these exercises, you will:

1. **Master Fan-out and Fan-in Patterns**: Learn to create graphs where nodes execute in parallel and then converge
2. **Understand State Reducers**: Practice using different reducer functions to handle concurrent state updates
3. **Control Execution Order**: Implement custom reducers to manage the order of parallel updates
4. **Build Real-world Applications**: Create parallel processing systems using LLMs and external APIs
5. **Handle Concurrent Errors**: Debug and fix common issues with parallel execution

## Prerequisites

Before starting these exercises, ensure you understand:
- Basic LangGraph concepts (nodes, edges, state)
- StateGraph creation and compilation
- Basic Python typing and TypedDict

Let's begin with the setup and our first exercise!


In [None]:
%%capture --no-stderr
%pip install -U langgraph tavily-python wikipedia langchain_openai langchain_community langgraph_sdk

In [None]:
import os, getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

## Exercise 1: Understanding the Parallelization Problem

Let's start by reproducing the core issue that parallelization solves. First, you'll create a linear graph, then attempt to make it parallel and encounter the error.

**Your Task**: Create a simple linear graph with 4 nodes (a, b, c, d) that each append their name to a state list.

In [None]:
from IPython.display import Image, display
from typing import Any
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END

# TODO: Define your State class
# Hint: Use a simple list field called 'names'
class State(TypedDict):
    # YOUR CODE HERE
    pass

# TODO: Create a node function class that appends a name to the state
# Hint: Look at the ReturnNodeValue class from the lesson
class AppendName:
    def __init__(self, name: str):
        # YOUR CODE HERE
        pass

    def __call__(self, state: State) -> Any:
        # YOUR CODE HERE - append the name to the state
        print(f"Adding {self._name} to {state['names']}")
        return {"names": [self._name]}

# TODO: Create the graph with linear flow: START -> a -> b -> c -> d -> END
builder = StateGraph(State)

# YOUR CODE HERE - Add nodes and edges

graph = builder.compile()
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
# Test your linear graph
result = graph.invoke({"names": []})
print("Final result:", result)
# Expected: Each node overwrites the previous, so only the last node's name should remain

**Question**: What happened to the state? Why do we only see the last node's name?

**Your Answer**: 
<!-- Write your explanation here -->


## Exercise 2: Encountering the Parallel Execution Error

Now let's try to make nodes b and c run in parallel and see what error occurs.

**Your Task**: Modify the graph to have a fan-out from 'a' to both 'b' and 'c', then fan-in to 'd'.

In [None]:
# TODO: Create a new graph with parallel execution
# Flow should be: START -> a -> [b, c] -> d -> END
builder = StateGraph(State)

# Add the same nodes
builder.add_node("a", AppendName("A"))
builder.add_node("b", AppendName("B"))
builder.add_node("c", AppendName("C"))
builder.add_node("d", AppendName("D"))

# TODO: Add edges to create the parallel pattern
# YOUR CODE HERE

graph = builder.compile()
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
# Try to run the parallel graph and observe the error
from langgraph.errors import InvalidUpdateError

try:
    result = graph.invoke({"names": []})
    print("Result:", result)
except InvalidUpdateError as e:
    print(f"Error encountered: {e}")
    print("\nThis error occurs because both 'b' and 'c' are trying to write to the same state key simultaneously.")

**Question**: Why does this error occur? What is LangGraph trying to protect us from?

**Your Answer**: 
<!-- Write your explanation here -->


## Exercise 3: Fixing Parallel Execution with Reducers

Now you'll fix the parallel execution issue by using a reducer function.

**Your Task**: Modify the State class to use `operator.add` as a reducer for the names field.

In [None]:
import operator
from typing import Annotated

# TODO: Create a new State class with a reducer
# Hint: Use Annotated[list, operator.add] for the names field
class StateWithReducer(TypedDict):
    # YOUR CODE HERE
    pass

# TODO: Update the AppendName class to work with the new state
class AppendNameWithReducer:
    def __init__(self, name: str):
        self._name = name

    def __call__(self, state: StateWithReducer) -> Any:
        print(f"Adding {self._name} to {state['names']}")
        # YOUR CODE HERE - return the name in the correct format
        return {"names": [self._name]}

# TODO: Create the parallel graph with the new state
builder = StateGraph(StateWithReducer)

# YOUR CODE HERE - Add nodes and edges for parallel execution

graph = builder.compile()
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
# Test the fixed parallel graph
result = graph.invoke({"names": []})
print("Final result:", result)
# Expected: All names should be present, showing successful parallel execution

**Question**: How does the `operator.add` reducer solve the parallel execution problem?

**Your Answer**: 
<!-- Write your explanation here -->


## Exercise 4: Complex Parallel Patterns

Let's create a more complex parallel pattern where one branch has multiple steps.

**Your Task**: Create a graph where:
- Node 'a' fans out to 'b' and 'c'
- Node 'b' goes to 'b2', then 'b3'
- Nodes 'b3' and 'c' both go to 'd'

This will test your understanding of how LangGraph waits for all parallel branches to complete.

In [None]:
# TODO: Create a graph with uneven parallel branches
builder = StateGraph(StateWithReducer)

# TODO: Add all necessary nodes
# YOUR CODE HERE

# TODO: Add edges to create the pattern described above
# YOUR CODE HERE

graph = builder.compile()
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
# Test the complex parallel graph
result = graph.invoke({"names": []})
print("Final result:", result)
print("\nOrder of execution:")
for i, name in enumerate(result['names'], 1):
    print(f"{i}. {name}")

**Question**: What do you notice about the execution order? When does node 'd' execute?

**Your Answer**: 
<!-- Write your explanation here -->


## Exercise 5: Custom Reducers for Ordering

The default order of parallel updates is not guaranteed. Let's create a custom reducer to control the order.

**Your Task**: Create a custom reducer that sorts the names alphabetically as they're added to the state.

In [None]:
# TODO: Create a custom reducer function that sorts names
def alphabetical_reducer(left, right):
    """ Combines and sorts the values alphabetically """
    # YOUR CODE HERE
    # Hint: Handle both single values and lists
    # Convert to lists if needed, combine, and sort
    pass

# TODO: Create a new State class with your custom reducer
class StateWithSorting(TypedDict):
    # YOUR CODE HERE
    pass

# Update the node class for the new state
class AppendNameWithSorting:
    def __init__(self, name: str):
        self._name = name

    def __call__(self, state: StateWithSorting) -> Any:
        print(f"Adding {self._name} to {state['names']}")
        return {"names": [self._name]}

In [None]:
# TODO: Test your custom reducer with a simple parallel graph
# Create nodes that will be added out of alphabetical order
builder = StateGraph(StateWithSorting)

# Add nodes with names that will test sorting: Z, A, M, B
# YOUR CODE HERE

# Create a fan-out pattern so we can see the sorting in action
# YOUR CODE HERE

graph = builder.compile()
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
# Test the sorting reducer
result = graph.invoke({"names": []})
print("Final result:", result)
print("\nNames should be in alphabetical order despite execution order")

**Question**: How does your custom reducer ensure consistent ordering regardless of execution timing?

**Your Answer**: 
<!-- Write your explanation here -->


## Exercise 6: Building a Multi-Source Research Assistant

Now let's build a practical application! You'll create a research assistant that gathers information from multiple sources in parallel.

**Your Task**: Create a graph that:
1. Takes a research question
2. Searches Wikipedia and web sources in parallel
3. Combines the results and generates a comprehensive answer

First, let's set up the required API keys:

In [None]:
# Set up API keys for external services
_set_env("TAVILY_API_KEY")

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_community.document_loaders import WikipediaLoader
from langchain_community.tools import TavilySearchResults

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# TODO: Define your research state
class ResearchState(TypedDict):
    # YOUR CODE HERE
    # Hint: You'll need fields for question, context, and answer
    # Use a reducer for context since multiple sources will add to it
    pass

In [None]:
# TODO: Implement the Wikipedia search function
def search_wikipedia(state: ResearchState):
    """ Search Wikipedia for information """
    print(f"Searching Wikipedia for: {state['question']}")
    
    # YOUR CODE HERE
    # 1. Use WikipediaLoader to search for the question
    # 2. Load up to 2 documents
    # 3. Format the results as a string with document metadata
    # 4. Return in the correct format for the state
    
    pass

# TODO: Implement the web search function  
def search_web(state: ResearchState):
    """ Search the web for information """
    print(f"Searching web for: {state['question']}")
    
    # YOUR CODE HERE
    # 1. Use TavilySearchResults to search for the question
    # 2. Get up to 3 results
    # 3. Format the results with URLs and content
    # 4. Return in the correct format for the state
    
    pass

# TODO: Implement the answer generation function
def generate_answer(state: ResearchState):
    """ Generate a comprehensive answer using all context """
    print("Generating comprehensive answer...")
    
    # YOUR CODE HERE
    # 1. Get the question and all context from state
    # 2. Create a prompt that asks the LLM to synthesize information
    # 3. Call the LLM with appropriate system and human messages
    # 4. Return the answer in the correct format
    
    pass

In [None]:
# TODO: Build the research assistant graph
builder = StateGraph(ResearchState)

# YOUR CODE HERE
# 1. Add nodes for Wikipedia search, web search, and answer generation
# 2. Create edges so both searches run in parallel from START
# 3. Both searches should feed into answer generation
# 4. Answer generation should go to END

research_graph = builder.compile()
display(Image(research_graph.get_graph().draw_mermaid_png()))

In [None]:
# Test your research assistant
research_question = "What are the latest developments in quantum computing in 2024?"

result = research_graph.invoke({
    "question": research_question,
    "context": [],
    "answer": ""
})

print("Research Question:", research_question)
print("\nAnswer:")
print(result['answer'].content if hasattr(result['answer'], 'content') else result['answer'])

**Question**: What are the advantages of running the searches in parallel rather than sequentially?

**Your Answer**: 
<!-- Write your explanation here -->


## Exercise 7: Advanced Parallel Processing Challenge

For this advanced challenge, you'll create a more sophisticated system that processes data through multiple parallel analysis pipelines.

**Your Task**: Create a data analysis graph that:
1. Takes a dataset description
2. Performs three types of analysis in parallel:
   - Statistical analysis
   - Trend analysis  
   - Predictive analysis
3. Synthesizes all analyses into a comprehensive report
4. Uses custom reducers to organize the results properly

In [None]:
# TODO: Create a sophisticated analysis state with multiple fields
class AnalysisState(TypedDict):
    # YOUR CODE HERE
    # Consider what fields you need for:
    # - Input data description
    # - Different types of analysis results
    # - Final report
    # Think about which fields need reducers
    pass

# TODO: Implement analysis functions
def statistical_analysis(state: AnalysisState):
    """ Perform statistical analysis """
    print("Performing statistical analysis...")
    
    # YOUR CODE HERE
    # Create a prompt for statistical analysis and call the LLM
    # Return results in appropriate format
    pass

def trend_analysis(state: AnalysisState):
    """ Perform trend analysis """
    print("Performing trend analysis...")
    
    # YOUR CODE HERE
    pass

def predictive_analysis(state: AnalysisState):
    """ Perform predictive analysis """
    print("Performing predictive analysis...")
    
    # YOUR CODE HERE
    pass

def synthesize_report(state: AnalysisState):
    """ Synthesize all analyses into a comprehensive report """
    print("Synthesizing comprehensive report...")
    
    # YOUR CODE HERE
    # Combine all analysis results into a final report
    pass

In [None]:
# TODO: Build the analysis graph
builder = StateGraph(AnalysisState)

# YOUR CODE HERE
# Create a graph where all three analyses run in parallel
# then feed into the synthesis step

analysis_graph = builder.compile()
display(Image(analysis_graph.get_graph().draw_mermaid_png()))

In [None]:
# Test your analysis system
data_description = "Monthly sales data for an e-commerce company over the past 3 years, including revenue, number of orders, and customer acquisition costs"

result = analysis_graph.invoke({
    "data_description": data_description,
    # Add other initial state fields as needed
})

print("Data Description:", data_description)
print("\nComprehensive Analysis Report:")
print(result['report'] if 'report' in result else "Report not found - check your implementation")

## Exercise 8: Debugging Parallel Execution

Let's practice debugging common issues with parallel execution.

**Your Task**: The following code has several issues that prevent proper parallel execution. Find and fix them.

In [None]:
# BUGGY CODE - Find and fix the issues
class BuggyState(TypedDict):
    results: list  # Issue 1: Missing reducer annotation
    count: int

def process_a(state: BuggyState):
    return {"results": "Result A", "count": 1}  # Issue 2: Wrong data type

def process_b(state: BuggyState):
    return {"results": ["Result B"], "count": 1}  # Issue 3: Concurrent count updates

def process_c(state: BuggyState):
    return {"results": ["Result C"], "count": 1}  # Issue 3: Concurrent count updates

# TODO: Fix the state definition and functions above
# Then create a working graph

# YOUR FIXED CODE HERE

**Question**: What were the main issues in the buggy code and how did you fix them?

**Your Answer**: 
<!-- List the issues and your solutions -->


## Exercise 9: Performance Comparison

Let's compare the performance benefits of parallel execution.

**Your Task**: Create two versions of a slow processing graph - one sequential and one parallel - and compare their execution times.

In [None]:
import time
import asyncio

class TimingState(TypedDict):
    results: Annotated[list, operator.add]
    start_time: float

def slow_process(name: str, delay: float):
    """ Create a slow processing function """
    def process(state: TimingState):
        print(f"Starting {name} (will take {delay}s)")
        time.sleep(delay)  # Simulate slow processing
        print(f"Finished {name}")
        return {"results": [f"{name} completed in {delay}s"]}
    return process

# TODO: Create a sequential graph
sequential_builder = StateGraph(TimingState)
# YOUR CODE HERE - make it sequential: a -> b -> c

sequential_graph = sequential_builder.compile()

# TODO: Create a parallel graph  
parallel_builder = StateGraph(TimingState)
# YOUR CODE HERE - make b and c run in parallel after a

parallel_graph = parallel_builder.compile()

In [None]:
# Test sequential execution
print("Testing Sequential Execution:")
start_time = time.time()
sequential_result = sequential_graph.invoke({"results": [], "start_time": start_time})
sequential_duration = time.time() - start_time
print(f"Sequential execution took: {sequential_duration:.2f} seconds")
print("Results:", sequential_result['results'])
print()

In [None]:
# Test parallel execution
print("Testing Parallel Execution:")
start_time = time.time()
parallel_result = parallel_graph.invoke({"results": [], "start_time": start_time})
parallel_duration = time.time() - start_time
print(f"Parallel execution took: {parallel_duration:.2f} seconds")
print("Results:", parallel_result['results'])
print()

print(f"Performance improvement: {sequential_duration/parallel_duration:.2f}x faster")

**Question**: What performance improvement did you observe? When would parallel execution be most beneficial?

**Your Answer**: 
<!-- Discuss the performance results and when to use parallelization -->


## Exercise 10: Real-World Application Design

For this final exercise, design your own real-world application that benefits from parallelization.

**Your Task**: Choose one of these scenarios (or create your own) and implement a solution:

1. **Content Moderation System**: Analyze content through multiple safety checks in parallel
2. **Financial Analysis Pipeline**: Process market data through different analytical models
3. **Multi-language Translation Service**: Translate text into multiple languages simultaneously
4. **Social Media Monitoring**: Track mentions across different platforms in parallel

Your implementation should demonstrate:
- Proper use of fan-out and fan-in patterns
- Appropriate state reducers
- Error handling for parallel operations
- Performance benefits of parallelization

In [None]:
# TODO: Choose your scenario and implement it here
# Include:
# 1. Clear state definition with appropriate reducers
# 2. Multiple processing functions that can run in parallel
# 3. A synthesis/aggregation step
# 4. Proper error handling

# YOUR IMPLEMENTATION HERE

# Example structure for guidance:
# class YourState(TypedDict):
#     input_data: str
#     parallel_results: Annotated[list, your_reducer]
#     final_output: str

# def parallel_function_1(state):
#     # Your parallel processing logic
#     pass

# def parallel_function_2(state):
#     # Your parallel processing logic  
#     pass

# def synthesis_function(state):
#     # Combine parallel results
#     pass

In [None]:
# TODO: Build and test your graph
# YOUR CODE HERE

In [None]:
# TODO: Test your implementation with sample data
# YOUR CODE HERE

**Reflection Questions**:

1. **What scenario did you choose and why?**
   <!-- Your answer here -->

2. **What specific benefits does parallelization provide in your use case?**
   <!-- Your answer here -->

3. **What challenges did you encounter and how did you solve them?**
   <!-- Your answer here -->

4. **How would you extend this solution for production use?**
   <!-- Your answer here -->


## Summary and Key Takeaways

Congratulations! You've completed the parallelization exercises. Here are the key concepts you've mastered:

### Core Concepts:
1. **Fan-out and Fan-in Patterns**: Creating graphs where execution branches and converges
2. **State Reducers**: Using functions like `operator.add` to handle concurrent state updates
3. **Custom Reducers**: Implementing specialized logic for combining parallel results
4. **Execution Synchronization**: Understanding how LangGraph waits for all parallel branches

### Best Practices:
- Always use reducers when multiple nodes write to the same state field in parallel
- Design your state schema carefully to avoid conflicts
- Consider the order of operations when designing parallel workflows
- Use parallelization for I/O-bound operations and independent computations

### Common Pitfalls to Avoid:
- Forgetting to use reducers for shared state fields
- Assuming a specific execution order within parallel steps
- Not handling errors properly in parallel branches
- Over-parallelizing CPU-bound operations

### When to Use Parallelization:
- External API calls that can be made simultaneously
- Independent data processing tasks
- Multiple analysis or validation steps
- Gathering information from multiple sources

These parallelization patterns form the foundation for building more complex multi-agent systems, which you'll explore in the next modules!
