# Output Parsers in LangChain

## What are Output Parsers?

**Output Parsers** are components that transform raw LLM outputs into structured, usable formats. LLMs inherently return plain text strings, but applications often need structured data (JSON, objects, lists, etc.). Output parsers bridge this gap.

### Why Use Output Parsers?

1. **Structure from Text**: Convert unstructured LLM output into structured data
2. **Type Safety**: Ensure data matches expected types and schemas
3. **Validation**: Enforce constraints and rules on generated data
4. **Integration**: Make output compatible with downstream systems
5. **Error Prevention**: Catch formatting issues before they cause problems
6. **Composability**: Chain with prompts and models seamlessly

### The Two-Part Job of Output Parsers

1. **Format Instructions**: Inject instructions into prompts telling LLMs how to format responses
   - Example: "Return a valid JSON object with keys: name, age, email"
2. **Parsing**: Transform the final text response into a Python object
   - Example: Convert JSON string → Python dictionary

### Output Parser Workflow

```
LLM Output (text string)
        ↓
[Validation] → Valid? → Parsed Output (Python object)
        ↓ Invalid
    [Retry/Error]
```

### Key Principle

**Better structured output = More reliable applications**. Output parsers ensure consistent, predictable data formats for downstream processing.


## StrOutputParser

### Definition

**StrOutputParser** is the simplest output parser in LangChain. It extracts just the **string content** from LLM message objects, stripping away metadata and converting them into plain Python strings.

### Key Features

- **Simplest Parser**: No schema or validation required
- **Text Extraction**: Pulls content from AIMessage objects
- **Passthrough Compatible**: Perfect for chaining with multiple prompts
- **No Validation**: Accepts any text output as-is
- **Lightweight**: Minimal overhead

### How It Works

```
AIMessage(content="Hello, I am Claude", metadata={...})
            ↓ [StrOutputParser]
         "Hello, I am Claude"
```

### Use Cases

- Simple text generation and completion
- Multi-step chains (output of one → input of next)
- Summarization tasks
- Content generation
- When you just need the text, not structure

### When to Use

✅ **Use StrOutputParser when:**
- Output is simple text/prose
- Chaining multiple prompt-model steps
- Data doesn't need strict structure
- Speed is important (no validation)

❌ **Don't use when:**
- Output needs to be JSON or structured
- Type safety is critical
- Validation rules required
- Complex data extraction needed

### Advanced Example: Chain of Thought


In [1]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from dotenv import load_dotenv
load_dotenv()

llm = HuggingFaceEndpoint(
    repo_id="Qwen/Qwen2.5-7B-Instruct",
    task="text-generation",
    temperature=0.3)
model = ChatHuggingFace(llm=llm)

# 1. StrOutputParser example code-------

from langchain_core.output_parsers import StrOutputParser

# Prompt 1 → detailed report
template1 = PromptTemplate(
    template="""
You are a helpful AI tutor.
Write a beginner friendly detailed report on {topic}.
""",
    input_variables=["topic"]
)

# Prompt 2 → summary
template2 = PromptTemplate(
    template="""
Write a short point-wise summary of the following text:
{text}
""",
    input_variables=["text"]
)

parser = StrOutputParser()

chain = template1 | model | parser | template2 | model | parser

result = chain.invoke({"topic": "LLM"})
print(result)

### Point-wise Summary of the Beginner's Guide to Large Language Models (LLMs)

1. **Introduction**:
   - Large Language Models (LLMs) are AI models designed to process and generate human-like text.
   - They are trained on vast amounts of text data, enabling them to understand and generate text across multiple languages and contexts.
   - Applications include customer service, content creation, and language translation.

2. **Definition of LLMs**:
   - LLMs predict the next word in a sequence of text based on patterns learned from large datasets.
   - The goal is to generate coherent and contextually relevant text.

3. **Key Features of LLMs**:
   - **Massive Size**: Contain billions or trillions of parameters.
   - **Contextual Understanding**: Can understand the context of sentences and conversations.
   - **Multilingual Support**: Capable of processing and generating text in multiple languages.
   - **Fine-Tuning**: Can be adapted for specific tasks through fine-tuning.

4. **How L

## JsonOutputParser

### Definition

**JsonOutputParser** ensures LLM output is formatted as **valid JSON** and automatically converts it into a Python dictionary for easy access.

### Key Features

- **Format Instructions**: Automatically injects JSON formatting rules
- **Validation**: Ensures output is valid JSON before parsing
- **Dictionary Conversion**: Returns Python `dict`, not string
- **Flexible Structure**: No schema required - works with any JSON
- **Partial Variables**: Supports injecting instructions into prompts

### How It Works

```
LLM Output (JSON string):
{"name": "Alice", "age": 30}
        ↓ [JsonOutputParser]
Python Dictionary:
{"name": "Alice", "age": 30}
```

### Use Cases

- API responses
- Configuration generation
- Multi-field data extraction
- Key-value pair generation
- Flexible structured data

### Advantages Over StrOutputParser

| Aspect | StrOutputParser | JsonOutputParser |
|--------|-----------------|------------------|
| **Output Type** | String | Dictionary |
| **Structure** | Unstructured | Structured (JSON) |
| **Validation** | None | JSON validation |
| **Type Access** | Manual parsing | Direct `dict` access |
| **Flexibility** | High | Medium |

### Injecting Format Instructions

- `parser.get_format_instructions()`: Returns JSON formatting instructions
- `partial_variables`: Pre-fills template variables (no manual injection needed)
- Model automatically sees formatting rules in prompt

### When to Use

✅ **Use JsonOutputParser when:**
- Output needs flexible structure
- Don't have a strict schema
- JSON is acceptable format
- Need dictionary access

❌ **Don't use when:**
- Need strict type validation
- JSON structure must follow a schema
- Type safety is critical


In [2]:
# JsonOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from dotenv import load_dotenv

load_dotenv()

#model 
llm = HuggingFaceEndpoint(
    repo_id="MiniMaxAI/MiniMax-M2.1",
    task="text-generation")
model = ChatHuggingFace(llm = llm)

#OutputParser
parser = JsonOutputParser()

#template
template = PromptTemplate(
    template="give me a name, age, address of a fictional character.\n {format_instruction}",
    input_variables=[],
    partial_variables={"format_instruction": parser.get_format_instructions()}
)

chain = template | model | parser
result = chain.invoke({})
print(result)

{'name': 'Sarah Mitchell', 'age': 32, 'address': '742 Evergreen Terrace, Springfield, IL 62701'}


## PydanticOutputParser

### Definition

**PydanticOutputParser** is the most powerful output parser. It uses **Pydantic models** to define strict schemas with type validation, constraints, and error handling.

### Key Features

- **Type Safety**: Enforce specific types (str, int, list, etc.)
- **Validation**: Complex validation rules (gt, lt, regex patterns)
- **Schema Definition**: Exact field structure enforced
- **Error Messages**: Clear feedback if validation fails
- **Type Hints**: Full IDE support and autocomplete
- **Composable**: Works seamlessly with chains

### How It Works

```
1. Define Pydantic Schema (Blueprint)
   class Person(BaseModel):
       name: str
       age: int
       email: str

2. LLM generates output (JSON string)
   {"name": "Alice", "age": 30, "email": "alice@example.com"}

3. Parser validates against schema
   ✓ Passes validation

4. Returns typed object
   person.name → "Alice"
   person.age → 30
```

### Use Cases

- Strict data validation
- API request/response validation
- Database model generation
- Form data extraction
- Production applications requiring reliability

### Validation Rules

```python
from pydantic import BaseModel, Field

class Person(BaseModel):
    name: str = Field(description="Full name")
    age: int = Field(gt=0, lt=150, description="Age in years")
    email: str = Field(regex=r"[\w\.-]+@[\w\.-]+\.\w+")
    role: str = Field(default="user", description="User role")
```

### Comparison: All Three Parsers

| Feature | StrOutputParser | JsonOutputParser | PydanticOutputParser |
|---------|-----------------|------------------|----------------------|
| **Output Type** | String | Dictionary | Typed Object |
| **Schema** | None | Optional | Required |
| **Validation** | None | JSON only | Full type checking |
| **Constraints** | None | None | Yes (gt, lt, regex) |
| **Type Safety** | None | Basic | Full |
| **Complexity** | Low | Medium | High |
| **Speed** | Fastest | Fast | Slightly slower |
| **Best For** | Simple text | Flexible JSON | Strict requirements |

### When to Use

✅ **Use PydanticOutputParser when:**
- Type safety is critical
- Validation rules required
- Production applications
- Need IDE autocomplete
- Complex data structures

❌ **Don't use when:**
- Simple text output needed
- Structure varies frequently
- Minimal validation required


In [3]:
# PydenticOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from dotenv import load_dotenv

load_dotenv()

#model 
llm = HuggingFaceEndpoint(
    repo_id="MiniMaxAI/MiniMax-M2.1",
    task="text-generation")
model = ChatHuggingFace(llm = llm)

class Person(BaseModel):
    name: str = Field(description='Persons name')
    age : int = Field(gt=18, description="Age of Person")
    address : str = Field(description="Place where person belongs to")


parser = PydanticOutputParser(pydantic_object=Person)

template = PromptTemplate(
    template= "get me name, age, address of a fictional {type} Person.\n {format_instruction}",
    input_variables=["type"],
    partial_variables={"format_instruction": parser.get_format_instructions()}
)

chain = template | model | parser
result = chain.invoke({"type": "indian"})
print(result)

name='Rajiv Menon' age=28 address='24, Gandhi Nagar, Chennai, Tamil Nadu, India'


## Best Practices for Output Parsers

### 1. Choose the Right Parser

**Decision Tree:**

```
Does output need structure?
├─ NO → Use StrOutputParser
├─ YES → Need strict schema?
    ├─ NO → Use JsonOutputParser
    └─ YES → Use PydanticOutputParser
```

### 2. Design Your Schema First

```python
from pydantic import BaseModel, Field
from typing import List, Optional

# ✅ GOOD: Clear, well-documented schema
class ProductReview(BaseModel):
    """A product review with validation"""
    product_name: str = Field(min_length=1, description="Name of product")
    rating: int = Field(ge=1, le=5, description="Rating from 1-5")
    pros: List[str] = Field(min_items=1, description="Positive aspects")
    cons: List[str] = Field(description="Negative aspects")
    recommendation: bool = Field(description="Would recommend?")
    reviewer_name: Optional[str] = Field(default=None, description="Reviewer name")

# ❌ BAD: No validation, unclear structure
class Review(BaseModel):
    text: str
    data: dict
```

### 3. Add Clear Format Instructions

```python
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Review(BaseModel):
    rating: int = Field(description="1-5 star rating")
    summary: str = Field(description="One sentence summary")

parser = PydanticOutputParser(pydantic_object=Review)

# ✅ GOOD: Explicit format instruction in prompt
prompt = PromptTemplate(
    template="""Analyze this product review and provide structured output:
    
Review: {review_text}

{format_instruction}""",
    input_variables=["review_text"],
    partial_variables={"format_instruction": parser.get_format_instructions()}
)

# ❌ BAD: No format instructions
prompt = PromptTemplate(
    template="Analyze this review: {review_text}",
    input_variables=["review_text"]
)
```

### 4. Error Handling and Retry

```python
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.exceptions import OutputParserException

parser = JsonOutputParser()

try:
    result = parser.parse(model_output)
except OutputParserException as e:
    print(f"Parse error: {e}")
    # Implement retry logic
```

### 5. Testing Your Parser

```python
# ✅ Test with expected output
test_output = '{"name": "Alice", "age": 30}'
result = parser.parse(test_output)
assert result["name"] == "Alice"

# ✅ Test validation rules
invalid_output = '{"name": "Bob", "age": -5}'  # Should fail (age constraint)
try:
    result = parser.parse(invalid_output)
except Exception:
    print("Validation correctly rejected invalid data")

# ✅ Test edge cases
edge_cases = [
    '{"name": "", "age": 0}',  # Empty/zero values
    '{"name": "X" * 1000}',     # Very long string
    '{}',                        # Missing fields
]
```

### 6. Performance Optimization

```python
# ✅ GOOD: Cache parser instances
parsers = {
    "review": PydanticOutputParser(pydantic_object=Review),
    "product": PydanticOutputParser(pydantic_object=Product),
}

# ✅ GOOD: Reuse format instructions
for prompt_id, prompt_template in prompts.items():
    instructions = parser.get_format_instructions()
    # Use same instructions (cached)

# ❌ BAD: Creating new parser for each request
for item in items:
    parser = PydanticOutputParser(pydantic_object=Review)  # Wasteful
```

### 7. Common Parsing Errors and Solutions

| Error | Cause | Solution |
|-------|-------|----------|
| **OutputParserException** | Invalid JSON/format | Add format instructions, retry with better prompt |
| **ValidationError** | Data doesn't match schema | Relax constraints or improve prompt |
| **KeyError** | Missing dictionary key | Add all expected fields to schema |
| **TypeError** | Wrong data type | Check type hints in schema |

## Summary: Output Parsers

### Parser Comparison Quick Reference

| Parser | Output | Use When | Effort |
|--------|--------|----------|--------|
| **StrOutputParser** | String | Simple text output | Low |
| **JsonOutputParser** | Dict | Flexible structure | Medium |
| **PydanticOutputParser** | Typed Object | Strict validation | High |

### Key Takeaways

1. **Right Parser**: Choose based on structure needs
2. **Format Instructions**: Always inject them via `partial_variables`
3. **Validation Rules**: Use Pydantic constraints (gt, lt, regex)
4. **Error Handling**: Implement try-except and retry logic
5. **Testing**: Test edge cases and invalid inputs
6. **Performance**: Cache parsers and instructions
7. **Type Safety**: Pydantic for production, Json for prototyping

### Common Patterns

```python
# Pattern 1: Simple text
parser = StrOutputParser()
chain = prompt | model | parser

# Pattern 2: JSON structure
parser = JsonOutputParser()
chain = prompt | model | parser

# Pattern 3: Strict schema
class Data(BaseModel):
    field: type = Field(...)

parser = PydanticOutputParser(pydantic_object=Data)
prompt = PromptTemplate(
    template="...\n{format_instruction}",
    partial_variables={"format_instruction": parser.get_format_instructions()}
)
chain = prompt | model | parser

# Pattern 4: Chaining parsers
chain = (
    prompt1 | model | str_parser |  # Get structured text
    prompt2 | model | json_parser   # Parse to JSON
)
```

### Next Steps

After mastering output parsers:
1. **Chains**: Combine parsers with prompts and models
2. **Validation**: Implement error handling and recovery
3. **Custom Parsers**: Create specialized parsers for domain-specific formats
4. **Agents**: Use parsers for agent action/input parsing
5. **RAG**: Parse retrieved documents

### Additional Resources

- [Pydantic Documentation](https://docs.pydantic.dev/)
- [LangChain Parsers Docs](https://python.langchain.com/docs/modules/model_io/output_parsers/)
- [Output Parser API Reference](https://api.python.langchain.com/en/latest/output_parsers/langchain_core.output_parsers.base.OutputParser.html)
