# **LangChain Structured Outputs from Chat Models**

## Introduction

As large language models (LLMs) like ChatGPT and Claude become increasingly integrated into diverse applications, the need for precise and reliable data exchange grows. Structured outputs offer a solution by enabling chat models to produce responses in specific, predictable formats. This article explores the concept of structured outputs, delves into LangChain's `.with_structured_output()` method, and provides practical examples to illustrate its implementation and benefits.

### What Are Structured Outputs?

Structured outputs refer to responses generated by chat models that adhere to predefined data formats, such as JSON, Pydantic models, or TypedDicts. Unlike unstructured text, structured outputs ensure consistency, facilitating seamless integration with other systems, APIs, or workflows that require specific data formats.

### Why Structured Outputs Matter

- **Predictability**: Downstream systems can reliably parse and utilize the data without ambiguity.
- **Efficiency**: Automates data formatting and validation, reducing the need for manual intervention.
- **Robustness**: Minimizes errors by enforcing data integrity through validation mechanisms.

---

## The `.with_structured_output()` Method

LangChain, a powerful framework for building applications with LLMs, offers the `.with_structured_output()` method to enhance interactions by enabling structured responses from chat models. This method is pivotal for developers aiming to integrate LLMs into systems that demand specific data formats.

### Key Features

1. **Declarative Data Models**:
    - Allows defining desired output structures using Pydantic models or JSON schemas.
    - **Example**:
      ```python
      from pydantic import BaseModel, Field, EmailStr
      
      class Person(BaseModel):
          name: str = Field(description="The person's full name")
          age: int = Field(description="The person's age in years")
      ```

2. **Custom Output Parsing**:
    - Transforms raw LLM outputs to match the defined schema.
    - Capable of handling complex data transformations, such as converting comma-separated text into lists.

3. **Error Handling and Validation**:
    - Automatically validates outputs against the specified schema.
    - Implements error correction loops to rectify invalid outputs using fallback mechanisms.

4. **Integration with OpenAI Function Calling**:
    - Utilizes OpenAI's function-calling APIs to ensure outputs are in structured formats like JSON.
    - **Example**:
      ```python
      from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
      chain = (
          prompt
          | model.bind(function_call={"name": "joke"}, functions=functions)
          | JsonOutputFunctionsParser()
      )
      ```

5. **Parallel and Modular Processing**:
    - Integrates seamlessly with other chains or components for multi-step processing.
    - **Example**:
      ```python
      retrieval_chain = (
          {"context": retriever, "question": RunnablePassthrough()}
          | prompt
          | model
          | StrOutputParser()
      )
      ```

---

## Comparison of Methods

| **Feature**                         | **Pydantic**                                                   | **TypedDict**                                                 | **JSON Schema**                                              |
|-------------------------------------|----------------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------------|
| **Type Enforcement**                | Strong type enforcement with validation                        | Limited type enforcement                                      | Schema-based validation                                     |
| **Ease of Use**                     | Requires defining classes; more verbose                        | Simpler for dictionary-like structures                        | Requires understanding of JSON Schema syntax                |
| **Flexibility**                     | Highly flexible with complex data models                       | Suitable for simpler data structures                           | Excellent for interoperability across different systems     |
| **Integration**                     | Seamless with Python applications and frameworks                | Easy to integrate with Python codebases                        | Language-agnostic, ideal for API integrations                |
| **Validation Capabilities**         | Extensive validation options                                   | Basic type annotations                                         | Comprehensive validation rules                               |
| **Use Cases**                       | Enterprise applications, data pipelines, APIs                   | Lightweight applications, quick prototypes                      | API specifications, cross-language data exchange             |
| **Example Tools/Frameworks**        | FastAPI, Django, LangChain                                     | LangChain                                                     | OpenAPI, LangChain                                          |

### Choosing the Right Method

- **Use Pydantic** when you need robust data validation and are working within Python-centric ecosystems.
- **Use TypedDict** for simpler scenarios where lightweight type annotations suffice.
- **Use JSON Schema** when interoperability and cross-language support are paramount.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain-anthropic
!pip install -qU langchain_community
!pip install -qU langchain_experimental

In [None]:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from kaggle_secrets import UserSecretsClient
from langchain_core.output_parsers import StrOutputParser

# Retrieve LLM API Key
user_secrets = UserSecretsClient()

# Initialize the language model
#model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))
#model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))

---

## 1. Pydantic

### 1.1 Using Standard Fields with Pydantic
Pydantic is a data validation and settings management library that uses Python type annotations. It is widely used for defining data models with type enforcement.

In [None]:
from typing import Optional
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

# Define a Pydantic class for the joke schema
class Joke(BaseModel):
    """Joke to tell user."""
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(default=None, description="How funny the joke is, from 1 to 10")

# Initialize the ChatOpenAI model
# model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))

model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the Pydantic class
structured_llm = model.with_structured_output(Joke)

# Generate a joke about cats
result = structured_llm.invoke("Tell me a joke about cats")

# Access fields using dot notation
print("Setup:", result.setup)
print("Punchline:", result.punchline)
print("Rating:", result.rating)

### 1.2 Using Literal Field with Pydantic Models
Pydantic is a data validation and settings management library that uses Python type annotations. It provides a way to define data structures with type checking.

In [None]:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from typing import Literal

# Define a Pydantic model for structured output
class WeatherResponse(BaseModel):
    location: str
    temperature: float
    unit: Literal["Celsius", "Fahrenheit"]

# Initialize the ChatOpenAI model
# model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))

model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to output structured data using the Pydantic model
structured_model = model.with_structured_output(WeatherResponse)

# Generate a structured response by invoking the RunnableSequence
response = structured_model.invoke("What's the weather in Paris?")
print(f"Location   : {response.location}")
print(f"Temperature: {response.temperature}")
print(f"Unit       : {response.unit}")

### 1.3 Using Pydantic with Union
You can configure a model to handle multiple types of structured outputs by using a Union type in the schema. This allows the model to choose between different output formats based on the input or context.

In [None]:
from typing import Union, Optional
from pydantic import BaseModel, Field
from langchain_anthropic import ChatAnthropic

# Define Pydantic classes for different response types
class Joke(BaseModel):
    """Joke to tell user."""
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(default=None, description="How funny the joke is, from 1 to 10")

class Fact(BaseModel):
    """Fact to tell user."""
    topic: str = Field(description="The topic of the fact")
    fact: str = Field(description="The fact itself")
    source: Optional[str] = Field(default=None, description="The source of the fact")

class FinalResponse(BaseModel):
    """Final response that can be either a joke or a fact."""
    response: Union[Joke, Fact]

# Initialize the ChatAnthropic model
# model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))

model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the Union schema
structured_llm = model.with_structured_output(FinalResponse)

# Generate a joke
result = structured_llm.invoke("Tell me a joke about cats")
print(result)

# Access fields of the nested response
if isinstance(result.response, Joke):
    print("Setup:", result.response.setup)
    print("Punchline:", result.response.punchline)
    print("Rating:", result.response.rating)

# Generate a fact
result = structured_llm.invoke("Tell me a fact about the moon")
print(result)

if isinstance(result.response, Fact):
    print("Topic:", result.response.topic)
    print("Fact:", result.response.fact)
    print("Source:", result.response.source)

---

## 2. TypedDict

### 2.1 Using Standard Fields with TypedDict
TypedDict allows for type annotations of dictionaries, making it suitable for defining structured outputs without the overhead of full-fledged models.

In [None]:
from typing import Optional
from typing_extensions import Annotated, TypedDict
from langchain_openai import ChatOpenAI

# Define a TypedDict for the joke schema
class Joke(TypedDict):
    """Joke to tell user."""
    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]

# Initialize the ChatOpenAI model
# model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))

model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the TypedDict
structured_llm = model.with_structured_output(Joke)

# Generate a joke about cats
result = structured_llm.invoke("Tell me a joke about cats")
print(result)

# Access fields using key-based access
print("Setup:", result["setup"])
print("Punchline:", result["punchline"])
print("Rating:", result["rating"])

### 2.2 Using Literal Field with TypedDict
TypedDict allows for the definition of dictionary-like data structures with type annotations, enhancing type safety and clarity.

In [None]:
from langchain_anthropic import ChatAnthropic
from typing import TypedDict, Literal

# Define a TypedDict for structured output
class ActionResponse(TypedDict):
    action: Literal["create", "update", "delete"]
    target: str
    details: str

# Initialize the ChatAnthropic model
# model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to output structured data using the TypedDict
structured_model = model.with_structured_output(ActionResponse)

# Generate a structured response by invoking the RunnableSequence
response = structured_model.invoke("Create a new user with the name John Doe.")
print(f"Action : {response['action']}")
print(f"Target : {response['target']}")
print(f"Details: {response['details']}")

### 2.3 Using TypedDict with Union
Similarly, you can use TypedDict with Union to handle multiple response types without the need for full-fledged models.

In [None]:
from typing import Union, Optional
from typing_extensions import Annotated, TypedDict
from langchain_openai import ChatOpenAI

# Define TypedDict classes for different response types
class Joke(TypedDict):
    """Joke to tell user."""
    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]

class Fact(TypedDict):
    """Fact to tell user."""
    topic: Annotated[str, ..., "The topic of the fact"]
    fact: Annotated[str, ..., "The fact itself"]
    source: Annotated[Optional[str], None, "The source of the fact"]

class FinalResponse(TypedDict):
    """Final response that can be either a joke or a fact."""
    response: Union[Joke, Fact]

# Initialize the ChatOpenAI model
# model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the Union schema
structured_llm = model.with_structured_output(FinalResponse)

# Generate a joke
result = structured_llm.invoke("Tell me a joke about cats")
print(result)

# Access fields of the nested response
if isinstance(result["response"], dict):  # Check if response is a dictionary (TypedDict)
    if "setup" in result["response"]:  # Check if it's a Joke
        print("Setup:", result["response"]["setup"])
        print("Punchline:", result["response"]["punchline"])
        print("Rating:", result["response"]["rating"])

# Generate a fact
result = structured_llm.invoke("Tell me a fact about the moon")
print(result)

# Access fields of the nested response
if isinstance(result["response"], dict):  # Check if response is a dictionary (TypedDict)
    if "topic" in result["response"]:  # Check if it's a Fact
        print("Topic:", result["response"]["topic"])
        print("Fact:", result["response"]["fact"])
        print("Source:", result["response"]["source"])

---

## 3. JSON Schema

### 3.1 Using Standard Fields with JSON Schema
JSON Schema provides a way to describe the structure and validation constraints of JSON data. It's especially useful for interoperability across different systems.

In [None]:
from langchain_anthropic import ChatAnthropic

# Define a JSON Schema for the joke
json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {"type": "string", "description": "The setup of the joke"},
        "punchline": {"type": "string", "description": "The punchline to the joke"},
        "rating": {"type": "integer", "description": "How funny the joke is, from 1 to 10", "default": None},
    },
    "required": ["setup", "punchline"],
}

# Initialize the ChatAnthropic model
# model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the JSON Schema
structured_llm = model.with_structured_output(json_schema)

# Generate a joke about cats
result = structured_llm.invoke("Tell me a joke about cats")
print(result)

# Access fields using key-based access
print("Setup:", result["setup"])
print("Punchline:", result["punchline"])
print("Rating:", result["rating"])

### 3.2 Using JSON Schema with Union
JSON Schema also supports Union through the `oneOf` keyword, enabling models to return one of several predefined schemas.

In [None]:
from langchain_anthropic import ChatAnthropic

# Define JSON Schemas for different response types
joke_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {"type": "string", "description": "The setup of the joke"},
        "punchline": {"type": "string", "description": "The punchline to the joke"},
        "rating": {"type": "integer", "description": "How funny the joke is, from 1 to 10", "default": None},
    },
    "required": ["setup", "punchline"],
}

fact_schema = {
    "title": "fact",
    "description": "Fact to tell user.",
    "type": "object",
    "properties": {
        "topic": {"type": "string", "description": "The topic of the fact"},
        "fact": {"type": "string", "description": "The fact itself"},
        "source": {"type": "string", "description": "The source of the fact", "default": None},
    },
    "required": ["topic", "fact"],
}

final_schema = {
    "title": "final_response",
    "description": "Final response that can be either a joke or a fact.",
    "type": "object",
    "properties": {
        "response": {
            "oneOf": [joke_schema, fact_schema],
        },
    },
    "required": ["response"],
}

# Initialize the ChatAnthropic model
# model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))

model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the JSON Schema
structured_llm = model.with_structured_output(final_schema)

# Generate a joke
result = structured_llm.invoke("Tell me a joke about cats")
print(result)

# Generate a fact
result = structured_llm.invoke("Tell me a fact about the moon")
print(result)

---

## 4. Advanced Techniques

### 4.1 Choosing Between Multiple Schemas
In scenarios where different types of responses are expected, defining multiple schemas and allowing the model to choose between them ensures flexibility and adaptability.

In [None]:
from typing import Union
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

# Define Pydantic classes for different response types
class Joke(BaseModel):
    """Joke to tell user."""
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(default=None, description="How funny the joke is, from 1 to 10")

class ConversationalResponse(BaseModel):
    """Respond in a conversational manner. Be kind and helpful."""
    response: str = Field(description="A conversational response to the user's query")

class FinalResponse(BaseModel):
    final_output: Union[Joke, ConversationalResponse]

# Initialize the ChatOpenAI model
# model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the Union schema
structured_llm = model.with_structured_output(FinalResponse)

# Generate a joke about cats
result = structured_llm.invoke("Tell me a joke about cats")
print(result)

# Generate a conversational response
result = structured_llm.invoke("How are you today?")
print(result)

### 4.2 Streaming Structured Output
Streaming allows for the gradual delivery of structured data as it is generated, which is beneficial for large responses or real-time applications.

In [None]:
from typing_extensions import Annotated, TypedDict
from langchain_anthropic import ChatAnthropic

# Define a TypedDict for the joke schema
class Joke(TypedDict):
    """Joke to tell user."""
    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]

# Initialize the ChatAnthropic model
# model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output using the TypedDict
structured_llm = model.with_structured_output(Joke)

# Stream the output for a joke about cats
for chunk in structured_llm.stream("Tell me a joke about cats"):
    print(chunk)

### 4.3 Few-Shot Prompting with Structured Output
Few-shot prompting involves providing the model with examples to guide its responses. When combined with structured outputs, it enhances the model's ability to generate consistent and accurate data structures.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# Define a system message with few-shot examples
system = """You are a hilarious comedian. Your specialty is knock-knock jokes. \
Return a joke which has the setup (the response to "Who's there?") and the final punchline (the response to "<setup> who?").

Here are some examples of jokes:

example_user: Tell me a joke about planes
example_assistant: {{"setup": "Why don't planes ever get tired?", "punchline": "Because they have rest wings!", "rating": 2}}

example_user: Tell me another joke about planes
example_assistant: {{"setup": "Cargo", "punchline": "Cargo 'vroom vroom', but planes go 'zoom zoom'!", "rating": 10}}

example_user: Now about caterpillars
example_assistant: {{"setup": "Caterpillar", "punchline": "Caterpillar really slow, but watch me turn into a butterfly and steal the show!", "rating": 5}}"""

# Create a ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([("system", system), ("human", "{input}")])

# Initialize the ChatOpenAI model
# model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))

model = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# Configure the model to return structured output
structured_llm = model.with_structured_output(Joke)

# Combine the prompt and structured LLM
few_shot_structured_llm = prompt | structured_llm

# Generate a joke about woodpeckers
result = few_shot_structured_llm.invoke("what's something funny about woodpeckers")
print(result)

## Conclusion

Structured outputs significantly enhance the reliability and efficiency of interactions with large language models. By enforcing predefined data formats, developers can ensure seamless integration with various systems, reduce ambiguity, and automate data validation processes. LangChain's `.with_structured_output()` method provides a versatile and powerful toolset for implementing structured outputs using Pydantic, TypedDict, or JSON Schema, catering to a wide range of applications and use cases.

Whether you're building enterprise-grade applications, APIs, or interactive systems, leveraging structured outputs can lead to more predictable and maintainable solutions. The practical examples showcased in this article demonstrate the ease and flexibility with which structured outputs can be integrated into your workflows, paving the way for more robust and scalable AI-driven applications.