# LangGraph Subgraph Tutorial

## Purpose

Learn how to build modular and reusable workflow components using LangGraph subgraphs for AI application development.

### What You'll Learn

- **Subgraph Fundamentals**: Composition patterns for complex AI workflows
- **Shared State Management**: Coordinating data flow between graph components  
- **Independent State Systems**: Isolation and transformation patterns
- **Production Examples**: Real-world AI agent orchestration

### Key Principles

- **Modularity**: Break complex AI workflows into manageable components
- **Reusability**: Build once, use across multiple AI applications
- **Practical Focus**: Examples derived from production AI systems

## Table of Contents

1. [Environment Setup](#environment-setup)
2. [Part 1: Subgraph Fundamentals](#part-1-subgraph-fundamentals)
3. [Part 2: Shared State Management](#part-2-shared-state-management)
4. [Part 3: Independent State Systems](#part-3-independent-state-systems)
5. [Part 4: Production Implementation](#part-4-production-implementation)
6. [Summary](#summary)

## Environment Setup

Configure the minimal environment for hands-on practice. API keys are managed through environment variables, with optional execution tracking.

In [1]:
# API KEY를 환경변수로 관리하기 위한 설정 파일
from dotenv import load_dotenv

# API KEY 정보로드
load_dotenv(override=True)

True

In [2]:
# LangSmith 추적을 설정합니다. https://smith.langchain.com
from langchain_teddynote import logging

# 프로젝트 이름을 입력합니다.
logging.langsmith("LangGraph-Tutorial")

LangSmith 추적을 시작합니다.
[프로젝트명]
LangGraph-Tutorial


---

# Part 1: Subgraph Fundamentals

## Understanding Subgraphs

A subgraph is an independent graph component that can be embedded within a larger graph structure. This pattern enables modular architecture in AI applications.

### Core Concepts
- **Component Isolation**: Each subgraph handles a specific processing task
- **Reusability**: Components can be shared across different AI workflows
- **Testability**: Independent testing of individual processing units

### When to Use Subgraphs
- Complex multi-stage AI pipelines
- Reusable processing components
- Team-based development with clear boundaries
- Production systems requiring modular architecture

### Implementation Example: Text Processing Pipeline
We'll build a text preprocessing system commonly used in NLP applications.

In [None]:
from typing_extensions import TypedDict
from typing import List
from langgraph.graph import StateGraph, START, END
import re

# Define state for text processing pipeline
class ProcessingState(TypedDict):
    text: str
    tokens: List[str]

def create_text_preprocessor():
    """Text preprocessing subgraph for NLP applications"""
    
    subgraph = StateGraph(ProcessingState)
    
    def tokenize(state: ProcessingState):
        """Basic tokenization"""
        tokens = state["text"].lower().split()
        return {"tokens": tokens}
    
    def normalize(state: ProcessingState):
        """Remove special characters and normalize"""
        clean_tokens = [re.sub(r'[^\w\s]', '', token) for token in state["tokens"]]
        clean_tokens = [token for token in clean_tokens if token.strip()]
        return {"tokens": clean_tokens}
    
    subgraph.add_node("tokenize", tokenize)
    subgraph.add_node("normalize", normalize)
    
    subgraph.add_edge(START, "tokenize")
    subgraph.add_edge("tokenize", "normalize")
    subgraph.add_edge("normalize", END)
    
    return subgraph.compile()

# Main processing pipeline
main_pipeline = StateGraph(ProcessingState)

def load_text(state: ProcessingState):
    """Input text loading"""
    return {"text": f"Input: {state['text']}"}

def finalize_output(state: ProcessingState):
    """Generate final processed output"""
    processed = " ".join(state["tokens"])
    return {"text": f"Processed: {processed}"}

# Add nodes to main pipeline
main_pipeline.add_node("load", load_text)
main_pipeline.add_node("preprocessor", create_text_preprocessor())
main_pipeline.add_node("finalize", finalize_output)

# Define execution flow
main_pipeline.add_edge(START, "load")
main_pipeline.add_edge("load", "preprocessor")
main_pipeline.add_edge("preprocessor", "finalize")
main_pipeline.add_edge("finalize", END)

# Compile and execute
pipeline = main_pipeline.compile()

# Test with sample data
result = pipeline.invoke({
    "text": "Hello, World! This is a sample text for processing.",
    "tokens": []
})

print(f"Result: {result['text']}")
print(f"Tokens: {result['tokens']}")

Graph visualization demonstrates the modular structure.

In [None]:
from langchain_teddynote.graphs import visualize_graph

# Visualize the text processing pipeline structure
print("Text Processing Pipeline with Subgraph:")
visualize_graph(pipeline, xray=True)

---

# Part 2: Shared State Management

## Shared State Pattern

In shared state management, parent and subgraphs access the same state variables. This pattern enables seamless data coordination across different processing stages.

### Key Characteristics
- **Unified State**: All components access the same state variables
- **Real-time Updates**: Changes propagate immediately across components
- **Simple Integration**: No transformation layers required

### Use Cases
- Multi-stage AI pipelines sharing intermediate results
- Collaborative processing between different AI models
- Progress tracking across complex workflows

### Implementation Example: RAG (Retrieval-Augmented Generation) System
We'll build a RAG system where retrieval and generation components share state.

In [None]:
# Define shared state for RAG system
class RAGState(TypedDict):
    query: str
    documents: List[str]
    answer: str

def create_retrieval_system():
    """Document retrieval subgraph"""
    
    builder = StateGraph(RAGState)
    
    def search_documents(state: RAGState):
        """Simulate document retrieval"""
        query = state["query"].lower()
        
        # Simple keyword-based retrieval simulation
        mock_db = [
            "LangGraph is a framework for building stateful AI applications",
            "Subgraphs enable modular architecture in complex workflows",
            "RAG combines retrieval and generation for better AI responses",
            "Python is widely used for AI application development"
        ]
        
        relevant_docs = [doc for doc in mock_db if any(word in doc.lower() for word in query.split())]
        
        return {"documents": relevant_docs if relevant_docs else mock_db[:2]}
    
    builder.add_node("search", search_documents)
    builder.add_edge(START, "search")
    builder.add_edge("search", END)
    
    return builder.compile()

def create_generation_system():
    """Answer generation subgraph"""
    
    builder = StateGraph(RAGState)
    
    def generate_answer(state: RAGState):
        """Generate answer based on retrieved documents"""
        docs = state["documents"]
        query = state["query"]
        
        # Simple answer generation (in production, use actual LLM)
        context = " ".join(docs)
        answer = f"Based on the retrieved information: {context[:100]}... Answer to '{query}'"
        
        return {"answer": answer}
    
    builder.add_node("generate", generate_answer)
    builder.add_edge(START, "generate")
    builder.add_edge("generate", END)
    
    return builder.compile()

# Main RAG pipeline
rag_pipeline = StateGraph(RAGState)

def process_query(state: RAGState):
    """Initialize query processing"""
    return {"query": f"Query: {state['query']}"}

def finalize_response(state: RAGState):
    """Finalize the response"""
    return {"answer": f"Final: {state['answer']}"}

# Add components to pipeline
rag_pipeline.add_node("process", process_query)
rag_pipeline.add_node("retriever", create_retrieval_system())
rag_pipeline.add_node("generator", create_generation_system())
rag_pipeline.add_node("finalize", finalize_response)

# Define execution flow
rag_pipeline.add_edge(START, "process")
rag_pipeline.add_edge("process", "retriever")
rag_pipeline.add_edge("retriever", "generator")
rag_pipeline.add_edge("generator", "finalize")
rag_pipeline.add_edge("finalize", END)

# Compile and test
rag_system = rag_pipeline.compile()

# Execute with test query
result = rag_system.invoke({
    "query": "What is LangGraph?",
    "documents": [],
    "answer": ""
})

print(f"Query: {result['query']}")
print(f"Retrieved docs: {len(result['documents'])} documents")
print(f"Answer: {result['answer']}")

In [None]:
# Visualize RAG system architecture
print("RAG System with Shared State:")
visualize_graph(rag_system, xray=True)

---

# Part 3: Independent State Systems

## Independent State Pattern

In independent state systems, parent and subgraphs use completely different state schemas. Data exchange occurs through explicit transformation layers.

### Key Benefits
- **Complete Encapsulation**: Internal implementation hidden from external components
- **Independent Evolution**: Components can change without affecting others
- **Domain Optimization**: Each component uses optimal data structures
- **Security**: Sensitive internal state remains isolated

### Transformation Flow
1. Parent state transforms to subgraph input format
2. Subgraph processes data independently
3. Subgraph output transforms back to parent state

### Implementation Example: LLM Processing Pipeline
We'll build a multi-stage LLM system where each stage has specialized state management.

In [None]:
# Parent system state (user-facing)
class UserRequest(TypedDict):
    input: str
    output: str

# LLM processing state (internal)
class LLMState(TypedDict):
    prompt: str
    response: str
    metadata: dict

def create_llm_processor():
    """LLM processing subgraph with independent state"""
    
    builder = StateGraph(LLMState)
    
    def analyze_request(state: LLMState):
        """Analyze input and prepare processing metadata"""
        metadata = {
            "length": len(state["prompt"]),
            "complexity": "high" if len(state["prompt"]) > 50 else "low"
        }
        return {"metadata": metadata}
    
    def process_with_llm(state: LLMState):
        """Simulate LLM processing"""
        prompt = state["prompt"]
        complexity = state["metadata"]["complexity"]
        
        # Simulate different processing based on complexity
        if complexity == "high":
            response = f"Complex analysis: {prompt[:30]}... [detailed processing]"
        else:
            response = f"Simple response: {prompt}"
            
        return {"response": response}
    
    def format_output(state: LLMState):
        """Format the final output"""
        formatted = f"Processed: {state['response']} (Length: {state['metadata']['length']})"
        return {"response": formatted}
    
    builder.add_node("analyze", analyze_request)
    builder.add_node("process", process_with_llm)
    builder.add_node("format", format_output)
    
    builder.add_edge(START, "analyze")
    builder.add_edge("analyze", "process")
    builder.add_edge("process", "format")
    builder.add_edge("format", END)
    
    return builder.compile()

# Main orchestration system
main_system = StateGraph(UserRequest)

def receive_input(state: UserRequest):
    """Process incoming user request"""
    return {"input": f"Received: {state['input']}"}

def process_with_llm_system(state: UserRequest):
    """Process using LLM subgraph with state transformation"""
    
    # Transform user state to LLM state
    llm_input = {
        "prompt": state["input"],
        "response": "",
        "metadata": {}
    }
    
    # Execute LLM subgraph
    llm_system = create_llm_processor()
    llm_result = llm_system.invoke(llm_input)
    
    # Transform LLM result back to user state
    return {"output": llm_result["response"]}

def finalize_response(state: UserRequest):
    """Finalize user response"""
    return {"output": f"Complete: {state['output']}"}

# Build main system
main_system.add_node("receive", receive_input)
main_system.add_node("llm_process", process_with_llm_system)
main_system.add_node("finalize", finalize_response)

main_system.add_edge(START, "receive")
main_system.add_edge("receive", "llm_process")
main_system.add_edge("llm_process", "finalize")
main_system.add_edge("finalize", END)

# Compile and test
llm_pipeline = main_system.compile()

# Test with different complexity inputs
test_cases = [
    "Simple query",
    "This is a more complex query that requires detailed analysis and processing"
]

for test_input in test_cases:
    result = llm_pipeline.invoke({"input": test_input, "output": ""})
    print(f"Input: {test_input}")
    print(f"Output: {result['output']}")
    print("-" * 50)

In [None]:
# Visualize LLM pipeline with independent states
print("LLM Processing Pipeline with Independent States:")
visualize_graph(llm_pipeline, xray=True)

---

# Part 4: Production Implementation

## AI Agent Orchestration

Building production-ready AI systems requires orchestrating multiple specialized agents. Each agent handles specific domain expertise while coordinating through a central orchestrator.

### System Architecture
- **Research Agent**: Information gathering and analysis
- **Processing Agent**: Data transformation and computation
- **Reporting Agent**: Result compilation and formatting
- **Orchestrator**: Task routing and workflow coordination

### Production Benefits
- **Scalability**: Independent scaling of specialized components
- **Maintainability**: Clear separation of concerns
- **Extensibility**: Easy addition of new agent types
- **Reliability**: Isolated failure domains

### Implementation Example: AI Research Pipeline
We'll build a multi-agent system for automated research and analysis tasks.

In [None]:
# Shared state for agent coordination
class AgentState(TypedDict):
    task: str
    result: str
    agent_type: str

def create_research_agent():
    """Research agent subgraph"""
    
    builder = StateGraph(AgentState)
    
    def gather_information(state: AgentState):
        """Simulate information gathering"""
        task = state["task"]
        
        # Simulate research based on task content
        if "analysis" in task.lower():
            result = f"Research findings: {task} - Found 3 relevant studies, 5 key metrics"
        elif "comparison" in task.lower():
            result = f"Comparative analysis: {task} - Identified 4 alternatives, pros/cons matrix"
        else:
            result = f"General research: {task} - Collected background information, key concepts"
        
        return {"result": result, "agent_type": "research"}
    
    builder.add_node("research", gather_information)
    builder.add_edge(START, "research")
    builder.add_edge("research", END)
    
    return builder.compile()

def create_processing_agent():
    """Data processing agent subgraph"""
    
    builder = StateGraph(AgentState)
    
    def process_data(state: AgentState):
        """Process and analyze data"""
        task = state["task"]
        
        # Simulate data processing
        result = f"Processing complete: {task} - Data normalized, patterns identified, metrics calculated"
        
        return {"result": result, "agent_type": "processing"}
    
    builder.add_node("process", process_data)
    builder.add_edge(START, "process")
    builder.add_edge("process", END)
    
    return builder.compile()

def create_reporting_agent():
    """Report generation agent subgraph"""
    
    builder = StateGraph(AgentState)
    
    def generate_report(state: AgentState):
        """Generate final report"""
        previous_result = state.get("result", "")
        
        report = f"Executive Summary: {previous_result[:50]}... [Full report with charts and recommendations]"
        
        return {"result": report, "agent_type": "reporting"}
    
    builder.add_node("report", generate_report)
    builder.add_edge(START, "report")
    builder.add_edge("report", END)
    
    return builder.compile()

# Main orchestration system
orchestrator = StateGraph(AgentState)

def classify_task(state: AgentState):
    """Classify incoming task"""
    task = state["task"].lower()
    
    if "research" in task or "analyze" in task:
        agent_type = "research"
    elif "process" in task or "compute" in task:
        agent_type = "processing"
    else:
        agent_type = "research"  # Default to research
    
    return {"agent_type": agent_type}

def route_to_agent(state: AgentState):
    """Route task to appropriate agent"""
    agent_type = state["agent_type"]
    
    if agent_type == "research":
        research_agent = create_research_agent()
        result = research_agent.invoke(state)
        return {"result": result["result"], "agent_type": result["agent_type"]}
    else:
        processing_agent = create_processing_agent()
        result = processing_agent.invoke(state)
        return {"result": result["result"], "agent_type": result["agent_type"]}

def finalize_with_report(state: AgentState):
    """Generate final report"""
    reporting_agent = create_reporting_agent()
    result = reporting_agent.invoke(state)
    return {"result": result["result"]}

# Build orchestration pipeline
orchestrator.add_node("classify", classify_task)
orchestrator.add_node("route", route_to_agent)
orchestrator.add_node("report", finalize_with_report)

orchestrator.add_edge(START, "classify")
orchestrator.add_edge("classify", "route")
orchestrator.add_edge("route", "report")
orchestrator.add_edge("report", END)

# Compile and test
agent_system = orchestrator.compile()

# Test with different task types
test_tasks = [
    "Analyze market trends for AI startups",
    "Process customer feedback data",
    "Research competitive landscape"
]

print("AI Agent Orchestration Results:")
print("=" * 50)

for task in test_tasks:
    result = agent_system.invoke({
        "task": task,
        "result": "",
        "agent_type": ""
    })
    
    print(f"Task: {task}")
    print(f"Agent: {result['agent_type']}")
    print(f"Result: {result['result'][:80]}...")
    print("-" * 50)

In [None]:
# Visualize agent orchestration system
print("AI Agent Orchestration System:")
visualize_graph(agent_system, xray=True)

---

# Summary

## Concepts Covered

You have successfully learned the core concepts of LangGraph subgraphs for AI application development.

### Key Learning Outcomes

1. **Subgraph Fundamentals**
   - Modular architecture for AI workflows
   - Component isolation and reusability
   - Production-ready design patterns

2. **Shared State Management**
   - Coordinated data flow between components
   - Real-time state synchronization
   - Efficient inter-component communication

3. **Independent State Systems**
   - Encapsulated component design
   - Transformation-based data exchange
   - Security through isolation

4. **Production Implementation**
   - Multi-agent orchestration patterns
   - Scalable AI system architecture
   - Enterprise-ready workflow design

## Next Steps

### For Intermediate Developers
- Implement error handling and recovery mechanisms
- Add monitoring and observability layers
- Optimize for high-throughput scenarios

### For Advanced Practitioners
- Design distributed agent systems
- Implement advanced routing strategies
- Build production deployment pipelines

### For System Architects
- Plan enterprise AI architectures
- Design multi-tenant agent systems
- Implement governance and compliance frameworks

## Best Practices

1. **Design Principles**: Single responsibility, loose coupling, high cohesion
2. **Testing Strategy**: Unit test individual subgraphs, integration test workflows
3. **Monitoring**: Implement comprehensive logging and metrics collection
4. **Documentation**: Maintain clear interface specifications and architectural diagrams

## Production Readiness

You now have the foundation to build production-grade AI applications using LangGraph subgraphs. Apply these patterns to create scalable, maintainable, and robust AI systems.

For advanced features and production deployment guidance, refer to the official LangGraph documentation.