[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/map-reduce-exercise.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239947-lesson-3-map-reduce)

# Map-Reduce Exercises

## Learning Objectives

By completing these exercises, you will be able to:

1. **Understand Map-Reduce fundamentals**: Recognize when and why to use map-reduce patterns in LangGraph
2. **Implement parallel processing**: Use the `Send` API to distribute work across multiple nodes
3. **Design state management**: Create appropriate state schemas with reducers for accumulating results
4. **Build complex workflows**: Combine map and reduce phases into cohesive applications
5. **Handle dynamic parallelization**: Create systems that adapt to varying input sizes

## Overview

Map-reduce is a powerful pattern for breaking down complex tasks into smaller, parallelizable units. In this exercise notebook, you'll progressively build map-reduce systems of increasing complexity, starting with the joke generation example and advancing to more sophisticated applications.

**Key Concepts to Master:**
- **Map Phase**: Distribute work across parallel nodes using `Send`
- **Reduce Phase**: Aggregate and synthesize results from parallel operations
- **State Management**: Use reducers like `operator.add` to accumulate results
- **Dynamic Routing**: Send varying numbers of tasks based on input data

---

## Setup and Dependencies

First, let's install the required packages and set up our environment.

In [None]:
%%capture --no-stderr
%pip install -U langchain_openai langgraph

In [None]:
import os, getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

In [None]:
# Set up LangSmith tracing
_set_env("LANGSMITH_API_KEY")
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "langchain-academy-exercises"

In [None]:
# Import required libraries
import operator
from typing import Annotated, List, Dict
from typing_extensions import TypedDict
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langgraph.graph import END, StateGraph, START
from langgraph.types import Send
from IPython.display import Image

# Initialize the LLM
model = ChatOpenAI(model="gpt-4o", temperature=0)

---

# Exercise 1: Understanding the Joke Generation System

Let's start by implementing the complete joke generation system step by step. This will help you understand the core map-reduce concepts.

## Step 1.1: Define the State Schema

**Task**: Complete the state definition below. Pay attention to the `jokes` field - why does it use `Annotated[list, operator.add]`?

In [None]:
class Subjects(BaseModel):
    subjects: list[str]

class BestJoke(BaseModel):
    id: int
    
class OverallState(TypedDict):
    topic: str
    subjects: list
    # TODO: Complete this line - what should the jokes field look like?
    # Hint: It needs to accumulate jokes from multiple parallel operations
    jokes: # YOUR CODE HERE
    best_selected_joke: str

**Reflection Question**: Why do we use `operator.add` as the reducer for the `jokes` field? What would happen if we used a regular `list` instead?

## Step 1.2: Implement the Topic Generation Node

**Task**: Complete the function that generates joke subjects from a main topic.

In [None]:
subjects_prompt = """Generate a list of 3 sub-topics that are all related to this overall topic: {topic}."""

def generate_topics(state: OverallState):
    """
    Generate 3 sub-topics related to the main topic.
    
    Args:
        state: OverallState containing the main topic
        
    Returns:
        dict: Updated state with subjects list
    """
    # TODO: Complete this function
    # 1. Format the prompt with the topic from state
    # 2. Use the model with structured output to get subjects
    # 3. Return the subjects in the correct format
    
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 1.3: Implement the Send Logic (Map Phase)

**Task**: Complete the function that uses `Send` to distribute joke generation tasks. This is the core of the map phase!

In [None]:
def continue_to_jokes(state: OverallState):
    """
    Create Send objects to distribute joke generation across subjects.
    
    Args:
        state: OverallState containing subjects list
        
    Returns:
        list[Send]: List of Send objects for parallel processing
    """
    # TODO: Complete this function
    # Create a Send object for each subject that:
    # - Targets the "generate_joke" node
    # - Passes the subject as state
    
    return # YOUR CODE HERE

## Step 1.4: Implement the Joke Generation Node

**Task**: Complete the joke generation logic that will run in parallel for each subject.

In [None]:
joke_prompt = """Generate a joke about {subject}"""

class JokeState(TypedDict):
    subject: str

class Joke(BaseModel):
    joke: str

def generate_joke(state: JokeState):
    """
    Generate a single joke for the given subject.
    
    Args:
        state: JokeState containing the subject
        
    Returns:
        dict: Updated state with the generated joke
    """
    # TODO: Complete this function
    # 1. Format the prompt with the subject
    # 2. Generate the joke using structured output
    # 3. Return the joke in a list (why a list?)
    
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 1.5: Implement the Best Joke Selection (Reduce Phase)

**Task**: Complete the reduce phase that selects the best joke from all generated jokes.

In [None]:
best_joke_prompt = """Below are a bunch of jokes about {topic}. Select the best one! Return the ID of the best one, starting 0 as the ID for the first joke. Jokes: \n\n  {jokes}"""

def best_joke(state: OverallState):
    """
    Select the best joke from all generated jokes.
    
    Args:
        state: OverallState containing all jokes
        
    Returns:
        dict: Updated state with the best selected joke
    """
    # TODO: Complete this function
    # 1. Join all jokes with double newlines
    # 2. Format the prompt with topic and jokes
    # 3. Get the best joke ID using structured output
    # 4. Return the actual joke text using the ID
    
    jokes = # YOUR CODE HERE
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 1.6: Build and Test the Graph

**Task**: Complete the graph construction and test it with different topics.

In [None]:
# TODO: Complete the graph construction
graph = StateGraph(OverallState)

# Add nodes
graph.add_node(# YOUR CODE HERE)
graph.add_node(# YOUR CODE HERE)
graph.add_node(# YOUR CODE HERE)

# Add edges
graph.add_edge(# YOUR CODE HERE)
graph.add_conditional_edges(# YOUR CODE HERE)
graph.add_edge(# YOUR CODE HERE)
graph.add_edge(# YOUR CODE HERE)

# Compile the graph
app = graph.compile()

# Visualize the graph
Image(app.get_graph().draw_mermaid_png())

In [None]:
# Test the graph with different topics
test_topics = ["technology", "cooking", "space exploration"]

for topic in test_topics:
    print(f"\n=== Testing with topic: {topic} ===")
    for s in app.stream({"topic": topic}):
        print(s)
    print("\n" + "-"*50)

**Reflection Questions**:
1. How does the parallelization work in this system?
2. What happens if one of the joke generation tasks fails?
3. How would you modify this to generate a different number of sub-topics?

---

# Exercise 2: Document Summarization System

Now let's build a more complex map-reduce system for document summarization. This system will:
1. **Map**: Split a document into chunks and summarize each chunk in parallel
2. **Reduce**: Combine all chunk summaries into a final comprehensive summary

## Step 2.1: Define the Document Processing State

**Task**: Design a state schema for document processing.

In [None]:
class DocumentChunks(BaseModel):
    chunks: list[str]

class ChunkSummary(BaseModel):
    summary: str

class FinalSummary(BaseModel):
    summary: str

class DocumentState(TypedDict):
    # TODO: Complete the state definition
    # Think about what fields you need:
    # - Original document text
    # - Document chunks
    # - Individual chunk summaries (with reducer)
    # - Final combined summary
    
    document: str
    chunks: # YOUR CODE HERE
    chunk_summaries: # YOUR CODE HERE
    final_summary: # YOUR CODE HERE

## Step 2.2: Implement Document Chunking

**Task**: Create a function that splits documents into manageable chunks.

In [None]:
def chunk_document(state: DocumentState):
    """
    Split the document into chunks for parallel processing.
    
    Args:
        state: DocumentState containing the original document
        
    Returns:
        dict: Updated state with document chunks
    """
    # TODO: Implement document chunking
    # For simplicity, split by paragraphs (double newlines)
    # Filter out empty chunks
    
    document = state["document"]
    
    # Split by double newlines and filter empty chunks
    chunks = # YOUR CODE HERE
    
    return {"chunks": chunks}

## Step 2.3: Implement the Send Logic for Chunk Processing

**Task**: Create the map phase that sends each chunk for summarization.

In [None]:
def continue_to_summaries(state: DocumentState):
    """
    Send each chunk to be summarized in parallel.
    
    Args:
        state: DocumentState containing chunks
        
    Returns:
        list[Send]: Send objects for parallel chunk summarization
    """
    # TODO: Create Send objects for each chunk
    # Each Send should target "summarize_chunk" and pass the chunk text
    
    return # YOUR CODE HERE

## Step 2.4: Implement Chunk Summarization

**Task**: Create the node that summarizes individual chunks.

In [None]:
chunk_summary_prompt = """Summarize the following text chunk in 2-3 sentences, capturing the key points:

{chunk}"""

class ChunkState(TypedDict):
    chunk: str

def summarize_chunk(state: ChunkState):
    """
    Summarize a single chunk of text.
    
    Args:
        state: ChunkState containing the chunk text
        
    Returns:
        dict: Updated state with chunk summary
    """
    # TODO: Implement chunk summarization
    # 1. Format the prompt with the chunk
    # 2. Generate summary using structured output
    # 3. Return as a list for the reducer
    
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 2.5: Implement Final Summary Generation (Reduce Phase)

**Task**: Create the reduce phase that combines all chunk summaries.

In [None]:
final_summary_prompt = """Below are summaries of different sections of a document. 
Create a comprehensive final summary that captures the main themes and key points from all sections:

{summaries}"""

def create_final_summary(state: DocumentState):
    """
    Combine all chunk summaries into a final comprehensive summary.
    
    Args:
        state: DocumentState containing all chunk summaries
        
    Returns:
        dict: Updated state with final summary
    """
    # TODO: Implement final summary generation
    # 1. Combine all chunk summaries
    # 2. Generate comprehensive final summary
    
    summaries = # YOUR CODE HERE
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 2.6: Build and Test the Document Summarization System

**Task**: Construct the graph and test it with a sample document.

In [None]:
# TODO: Build the document summarization graph
doc_graph = StateGraph(DocumentState)

# Add nodes
# YOUR CODE HERE

# Add edges
# YOUR CODE HERE

# Compile
doc_app = doc_graph.compile()

# Visualize
Image(doc_app.get_graph().draw_mermaid_png())

In [None]:
# Test with a sample document
sample_document = """
Artificial Intelligence has revolutionized numerous industries in recent years. From healthcare to finance, AI systems are transforming how we work and live.

In healthcare, AI is being used for medical diagnosis, drug discovery, and personalized treatment plans. Machine learning algorithms can analyze medical images with incredible accuracy, often surpassing human doctors in detecting certain conditions.

The finance industry has embraced AI for fraud detection, algorithmic trading, and risk assessment. Banks use AI to analyze spending patterns and detect suspicious activities in real-time.

However, the rise of AI also brings challenges. Concerns about job displacement, privacy, and algorithmic bias are growing. Society must address these issues as AI continues to evolve.

Looking forward, the future of AI holds both promise and uncertainty. Continued research and responsible development will be crucial for harnessing AI's benefits while minimizing potential risks.
"""

# Run the summarization
result = doc_app.invoke({"document": sample_document})
print("Original Document Length:", len(sample_document))
print("Number of Chunks:", len(result["chunks"]))
print("\nFinal Summary:")
print(result["final_summary"])

**Challenge**: Modify the system to handle different chunk sizes or use more sophisticated chunking strategies (e.g., by sentence count, word count, or semantic similarity).

---

# Exercise 3: Research Paper Analysis System

Let's build a sophisticated map-reduce system that analyzes research papers by:
1. **Map**: Analyzing different aspects (methodology, results, conclusions) in parallel
2. **Reduce**: Synthesizing insights into a comprehensive research report

## Step 3.1: Design the Research Analysis State

**Task**: Create a state schema for multi-aspect research paper analysis.

In [None]:
class ResearchAspects(BaseModel):
    aspects: list[str]  # e.g., ["methodology", "results", "conclusions", "novelty"]

class AspectAnalysis(BaseModel):
    aspect: str
    analysis: str
    key_points: list[str]
    strengths: list[str]
    weaknesses: list[str]

class ResearchReport(BaseModel):
    overall_summary: str
    key_contributions: list[str]
    strengths: list[str]
    limitations: list[str]
    significance_score: int  # 1-10

class ResearchState(TypedDict):
    # TODO: Define the complete state schema
    # Consider what information you need to track:
    
    paper_text: str
    analysis_aspects: # YOUR CODE HERE
    aspect_analyses: # YOUR CODE HERE (hint: needs reducer)
    final_report: # YOUR CODE HERE

## Step 3.2: Implement Aspect Generation

**Task**: Create a function that determines what aspects of the paper to analyze.

In [None]:
aspects_prompt = """Given this research paper excerpt, determine the most important aspects to analyze. 
Choose 4-5 aspects from: methodology, results, conclusions, novelty, experimental_design, 
literature_review, implications, limitations, reproducibility.

Paper excerpt:
{paper_text}"""

def generate_analysis_aspects(state: ResearchState):
    """
    Determine which aspects of the research paper to analyze.
    
    Args:
        state: ResearchState containing the paper text
        
    Returns:
        dict: Updated state with analysis aspects
    """
    # TODO: Implement aspect generation
    # Use the first 1000 characters of the paper as excerpt
    
    paper_excerpt = # YOUR CODE HERE
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 3.3: Implement Parallel Aspect Analysis Distribution

**Task**: Create the Send logic for distributing aspect analysis tasks.

In [None]:
def continue_to_aspect_analysis(state: ResearchState):
    """
    Send each aspect for parallel analysis.
    
    Args:
        state: ResearchState containing aspects to analyze
        
    Returns:
        list[Send]: Send objects for parallel aspect analysis
    """
    # TODO: Create Send objects for each aspect
    # Each Send should include both the aspect and the full paper text
    
    return # YOUR CODE HERE

## Step 3.4: Implement Individual Aspect Analysis

**Task**: Create the node that analyzes a specific aspect of the research paper.

In [None]:
aspect_analysis_prompt = """Analyze the {aspect} aspect of this research paper. Provide:
1. A detailed analysis of this aspect
2. Key points related to this aspect
3. Strengths in this aspect
4. Weaknesses or limitations in this aspect

Research Paper:
{paper_text}"""

class AspectState(TypedDict):
    aspect: str
    paper_text: str

def analyze_aspect(state: AspectState):
    """
    Analyze a specific aspect of the research paper.
    
    Args:
        state: AspectState containing aspect and paper text
        
    Returns:
        dict: Updated state with aspect analysis
    """
    # TODO: Implement aspect analysis
    
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    
    # Create the analysis object with the aspect name
    analysis_with_aspect = AspectAnalysis(
        aspect=state["aspect"],
        analysis=response.analysis,
        key_points=response.key_points,
        strengths=response.strengths,
        weaknesses=response.weaknesses
    )
    
    return # YOUR CODE HERE

## Step 3.5: Implement Research Report Generation (Reduce Phase)

**Task**: Create the reduce phase that synthesizes all aspect analyses into a comprehensive report.

In [None]:
report_generation_prompt = """Based on the following aspect analyses of a research paper, 
create a comprehensive research report with:
1. Overall summary
2. Key contributions
3. Overall strengths
4. Overall limitations
5. Significance score (1-10)

Aspect Analyses:
{analyses}"""

def generate_research_report(state: ResearchState):
    """
    Generate final research report from all aspect analyses.
    
    Args:
        state: ResearchState containing all aspect analyses
        
    Returns:
        dict: Updated state with final research report
    """
    # TODO: Implement report generation
    # 1. Combine all aspect analyses into a readable format
    # 2. Generate comprehensive report
    
    # Format analyses for the prompt
    analyses_text = ""
    for analysis in state["aspect_analyses"]:
        # YOUR CODE HERE - format each analysis
        pass
    
    prompt = # YOUR CODE HERE
    response = # YOUR CODE HERE
    return # YOUR CODE HERE

## Step 3.6: Build and Test the Research Analysis System

**Task**: Construct and test the complete research analysis system.

In [None]:
# TODO: Build the research analysis graph
research_graph = StateGraph(ResearchState)

# Add all nodes
# YOUR CODE HERE

# Add all edges
# YOUR CODE HERE

# Compile and visualize
research_app = research_graph.compile()
Image(research_app.get_graph().draw_mermaid_png())

In [None]:
# Test with a sample research paper abstract and introduction
sample_paper = """
Title: Attention Is All You Need

Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

Introduction:
Recurrent neural networks, long short-term memory and gated recurrent neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation. Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures.

The Transformer model architecture eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. This allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.
"""

# Run the research analysis
research_result = research_app.invoke({"paper_text": sample_paper})

print("Analysis Aspects:", research_result["analysis_aspects"])
print("\nNumber of Aspect Analyses:", len(research_result["aspect_analyses"]))
print("\nFinal Research Report:")
print(research_result["final_report"])

**Advanced Challenge**: Extend this system to:
1. Handle different types of papers (experimental, theoretical, survey)
2. Include citation analysis
3. Compare against related work
4. Generate peer review comments

---

# Exercise 4: Custom Map-Reduce System

Now it's your turn to design a complete map-reduce system from scratch!

## Challenge: Content Moderation System

**Scenario**: You need to build a content moderation system for a social media platform. The system should:

1. **Map Phase**: Analyze different aspects of content (toxicity, spam, misinformation, etc.) in parallel
2. **Reduce Phase**: Combine all analyses to make a final moderation decision

### Requirements:

1. **Content Analysis Aspects**:
   - Toxicity detection
   - Spam detection  
   - Misinformation potential
   - Hate speech detection
   - Adult content detection

2. **Final Decision**: 
   - Action (approve, flag, remove)
   - Confidence score
   - Reasoning
   - Specific violations found

### Your Task:

Complete the implementation below, filling in all the TODOs.

In [None]:
# Step 4.1: Define your data models
class ModerationAspects(BaseModel):
    # TODO: Define the structure for moderation aspects
    pass

class AspectResult(BaseModel):
    # TODO: Define the structure for individual aspect analysis results
    pass

class ModerationDecision(BaseModel):
    # TODO: Define the structure for final moderation decision
    pass

class ModerationState(TypedDict):
    # TODO: Define the complete state schema
    pass

In [None]:
# Step 4.2: Implement aspect determination
def determine_moderation_aspects(state: ModerationState):
    """
    Determine which aspects of content to analyze based on content type/characteristics.
    """
    # TODO: Implement logic to determine which aspects to analyze
    # Consider content length, type, etc.
    pass

In [None]:
# Step 4.3: Implement the Send logic
def distribute_moderation_tasks(state: ModerationState):
    """
    Distribute moderation analysis tasks across aspects.
    """
    # TODO: Create Send objects for parallel processing
    pass

In [None]:
# Step 4.4: Implement individual aspect analysis
class AspectModerationState(TypedDict):
    # TODO: Define state for individual aspect analysis
    pass

def analyze_content_aspect(state: AspectModerationState):
    """
    Analyze a specific aspect of the content for moderation.
    """
    # TODO: Implement aspect-specific analysis
    # Consider different prompts for different aspects
    pass

In [None]:
# Step 4.5: Implement final moderation decision
def make_moderation_decision(state: ModerationState):
    """
    Make final moderation decision based on all aspect analyses.
    """
    # TODO: Implement decision logic
    # Consider severity levels, confidence scores, etc.
    pass

In [None]:
# Step 4.6: Build and test your system
# TODO: Construct the graph
moderation_graph = StateGraph(ModerationState)

# Add nodes and edges
# YOUR CODE HERE

# Compile and visualize
moderation_app = moderation_graph.compile()
Image(moderation_app.get_graph().draw_mermaid_png())

In [None]:
# Test your moderation system
test_contents = [
    "This is a normal post about cooking recipes.",
    "This content might be questionable and potentially harmful.",
    "Check out this amazing product! Click here for 90% off! Limited time!"
]

for content in test_contents:
    print(f"\n=== Moderating: {content[:50]}... ===")
    # TODO: Test your system
    pass

---

# Reflection and Advanced Concepts

## Key Takeaways

After completing these exercises, reflect on the following:

### 1. When to Use Map-Reduce
- **Parallelizable tasks**: When you can break work into independent subtasks
- **Large-scale processing**: When dealing with data that benefits from parallel processing
- **Multi-aspect analysis**: When you need to analyze different dimensions of the same data

### 2. Design Patterns You've Learned
- **Dynamic parallelization**: Using `Send` to create varying numbers of parallel tasks
- **State accumulation**: Using reducers like `operator.add` to collect results
- **Hierarchical processing**: Breaking complex tasks into manageable subtasks

### 3. Performance Considerations
- **Balancing granularity**: Too many small tasks vs. too few large tasks
- **Error handling**: What happens when parallel tasks fail?
- **Resource management**: LLM API rate limits and costs

## Advanced Challenges

### Challenge 1: Error Handling
Modify one of your systems to handle cases where individual map tasks fail. How would you:
- Retry failed tasks?
- Proceed with partial results?
- Provide graceful degradation?

### Challenge 2: Dynamic Task Creation
Create a system where the number and type of map tasks depend on the analysis of initial results. For example:
- First, analyze content type
- Then, based on content type, determine specific analysis tasks
- Finally, combine results appropriately

### Challenge 3: Nested Map-Reduce
Design a system with multiple levels of map-reduce:
- Map 1: Break document into sections
- Map 2: For each section, analyze different aspects
- Reduce 2: Combine aspect analyses per section
- Reduce 1: Combine all section analyses

### Challenge 4: Conditional Reduce
Create a system where the reduce phase only triggers when certain conditions are met:
- Minimum number of results
- Quality threshold reached
- Timeout exceeded

## Questions for Further Exploration

1. **How would you implement load balancing** if different map tasks take significantly different amounts of time?

2. **What strategies would you use for caching** repeated analyses or intermediate results?

3. **How would you handle streaming data** where new items arrive while processing is ongoing?

4. **What metrics would you track** to monitor the performance and quality of your map-reduce systems?

5. **How would you extend these patterns** to handle real-time collaborative filtering or recommendation systems?

---

## Conclusion

Congratulations! You've now mastered the fundamental concepts of map-reduce in LangGraph. These patterns are powerful tools for building scalable, efficient AI systems that can handle complex, multi-faceted problems.

**Key skills you've developed:**
- Designing parallel processing workflows
- Managing complex state with reducers
- Using the `Send` API for dynamic task distribution
- Building systems that scale with input complexity
- Combining multiple AI analyses into cohesive results

These patterns will serve as building blocks for more complex AI systems, including multi-agent architectures and large-scale content processing pipelines.