# Tool Schema Definitions in LangChain: `Pydantic` vs. `TypedDict`

## Introduction

In the rapidly evolving landscape of artificial intelligence and machine learning, creating robust and reliable tools is paramount. LangChain, a versatile framework for building applications with language models, offers developers the ability to define structured tool schemas that ensure inputs are validated and adhere to specific formats. Two primary methods for defining these schemas are using **Pydantic classes** and **TypedDict**. Both approaches provide mechanisms to enforce input validation, but they differ in syntax, flexibility, and use cases.

This article delves into both methods, providing detailed examples to illustrate how each can be implemented within LangChain. By the end of this guide, you'll have a clear understanding of when to use Pydantic classes versus TypedDict, enabling you to build more reliable and maintainable tools within your LangChain applications.

### Comparison: `Pydantic Classes` vs. `TypedDict`

| Feature                      | **Pydantic Classes**                                                                 | **TypedDict**                                                     |
|------------------------------|--------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| **Validation**               | Provides robust data validation and error handling out of the box.                   | Limited to type annotations without built-in validation.         |
| **Default Values**           | Supports default values and complex field configurations.                            | Does not support default values inherently; requires manual handling. |
| **Type Enforcement**         | Enforces types strictly, raising errors for mismatches.                              | Uses type hints but does not enforce them at runtime.             |
| **Complex Data Structures**  | Easily handles nested and complex data structures.                                   | Can define nested structures but lacks advanced validation.       |
| **Performance**              | Slight overhead due to validation processes.                                        | Lightweight with minimal overhead since it relies on type hints.  |
| **Ease of Use**              | More verbose but offers comprehensive features for data management.                  | Simpler and more straightforward for defining basic schemas.      |
| **Extensibility**            | Highly extensible with support for custom validators and methods.                    | Limited extensibility; primarily used for static type definitions.|
| **Integration with Tools**   | Seamlessly integrates with tools that support Pydantic models for validation.        | Best suited for scenarios where simple type definitions are sufficient. |


---

## Preparation

### Installing Required Libraries
This section installs the necessary Python libraries for working with LangChain, OpenAI embeddings, Anthropic models, and other utilities. These libraries include:
- `langchain-openai`: Provides integration with OpenAI's embedding models and APIs.
- `langchain-anthropic`: Enables integration with Anthropic's models and APIs.
- `langchain_community`: Contains community-contributed modules and tools for LangChain.
- `langchain_experimental`: Includes experimental features and utilities for LangChain.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain-anthropic
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU pydantic[email]

### Initializing OpenAI and Anthropic Chat Models
This section demonstrates how to securely fetch API keys for OpenAI and Anthropic using Kaggle's `UserSecretsClient` and initialize their respective chat models. The `ChatOpenAI` and `ChatAnthropic` classes are used to create instances of these models, which can be used for natural language processing tasks such as text generation and conversational AI.

**Key steps:**
1. **Fetch API Keys**: The OpenAI and Anthropic API keys are securely retrieved using Kaggle's `UserSecretsClient`.
2. **Initialize Chat Models**:
   - The `ChatOpenAI` class is initialized with the `gpt-4o-mini` model and the fetched OpenAI API key.
   - The `ChatAnthropic` class is initialized with the `claude-3-5-latest` model and the fetched Anthropic API key.

In [None]:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from kaggle_secrets import UserSecretsClient

# Fetch API key securely
user_secrets = UserSecretsClient()

# Initialize LLM
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))
#model = ChatAnthropic(model="claude-3-5-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))

In [None]:
import json

def pretty_print(aimessage):
    """
    Pretty-prints an AIMessage object by converting it to JSON and formatting it with indentation.
    
    Args:
        aimessage: The AIMessage object to pretty-print.
    """
    # Convert the AIMessage object to a dictionary
    aimessage_dict = {
        "content": aimessage.content,
        "additional_kwargs": aimessage.additional_kwargs,
        "response_metadata": aimessage.response_metadata,
        "id": aimessage.id,
        "tool_calls": aimessage.tool_calls,
        "usage_metadata": aimessage.usage_metadata,
    }
    
    # Convert the dictionary to a JSON-formatted string with indentation
    pretty_json = json.dumps(aimessage_dict, indent=4, ensure_ascii=False)
    
    # Print the pretty JSON
    print(pretty_json)

---

## Tool Schemas with Pydantic Classes

Pydantic is a powerful data validation and settings management library that uses Python type annotations. Defining tool schemas with Pydantic classes in LangChain allows you to create structured inputs for your tools, ensuring that the inputs are validated and conform to a specific format. Below are some examples of how you can define tool schemas using Pydantic classes:

### Example 1: Simple Search Tool
This example demonstrates how to create a search tool using Pydantic classes to define the input schema. The tool accepts a search query and a limit for the number of results. By leveraging Pydantic's validation, the inputs are ensured to be of the correct type and format before the search function is executed.

In [None]:
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field

# Define the input schema using Pydantic
class SearchInput(BaseModel):
    query: str = Field(description="The search query")
    limit: int = Field(description="The maximum number of results to return", default=10)

# Define the function that will be wrapped as a tool
def search(query: str, limit: int) -> str:
    """Searches for information based on the query and limit."""
    return f"Searching for: {query} with limit: {limit}"

# Create the StructuredTool
search_tool = StructuredTool.from_function(
    func=search,
    name="search",
    description="Searches for information based on the query and limit.",
    args_schema=SearchInput
)

# Invoke the tool without an LLM
search_result = search_tool.invoke({"query": "LangChain", "limit": 5})
print(search_result)  # Output: Searching for: LangChain with limit: 5

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([search_tool])
ai_msg = model_with_tools.invoke("Search for LangChain with a limit of 5 results.")

pretty_print(ai_msg)

### Example 2: Calculator Tool
In this example, a calculator tool is defined using Pydantic to structure the inputs. The tool takes two numerical values and an operation to perform (add, subtract, multiply, divide). Pydantic ensures that the inputs are valid numbers and that the operation is one of the specified literals, preventing invalid operations.

In [None]:
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
from typing import Literal

# Define the input schema using Pydantic
class CalculatorInput(BaseModel):
    num1: float = Field(description="The first number")
    num2: float = Field(description="The second number")
    operation: Literal["add", "subtract", "multiply", "divide"] = Field(description="The operation to perform")

# Define the function that will be wrapped as a tool
def calculate(num1: float, num2: float, operation: str) -> float:
    """Performs a calculation based on the provided numbers and operation."""
    if operation == "add":
        return num1 + num2
    elif operation == "subtract":
        return num1 - num2
    elif operation == "multiply":
        return num1 * num2
    elif operation == "divide":
        return num1 / num2
    else:
        raise ValueError("Invalid operation")

# Create the StructuredTool
calculator_tool = StructuredTool.from_function(
    func=calculate,
    name="calculator",
    description="Performs basic arithmetic operations.",
    args_schema=CalculatorInput
)

# Invoke the tool without an LLM
calc_result = calculator_tool.invoke({"num1": 10, "num2": 5, "operation": "add"})
print(calc_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([calculator_tool])
ai_msg = model_with_tools.invoke("Add 10 and 5.")
pretty_print(ai_msg)

### Example 3: Weather Forecast Tool
This example illustrates how to build a weather forecast tool with Pydantic. The tool requires a location and a specific date to provide the weather forecast. Pydantic validates the location as a string and the date as a proper date object, ensuring accurate and reliable input data for the forecasting function.

In [None]:
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
from datetime import date

# Define the input schema using Pydantic
class WeatherInput(BaseModel):
    location: str = Field(description="The location for which to get the weather forecast")
    forecast_date: date = Field(description="The date for the weather forecast")

# Define the function that will be wrapped as a tool
def get_weather(location: str, forecast_date: date) -> str:
    """Gets the weather forecast for a specific location and date."""
    # In a real-world scenario, this function would call an external API
    return f"Weather forecast for {location} on {forecast_date}: Sunny"

# Create the StructuredTool
weather_tool = StructuredTool.from_function(
    func=get_weather,
    name="get_weather",
    description="Gets the weather forecast for a specific location and date.",
    args_schema=WeatherInput
)

# Invoke the tool without an LLM
weather_result = weather_tool.invoke({"location": "New York", "forecast_date": date(2023, 10, 1)})
print(weather_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([weather_tool])
ai_msg = model_with_tools.invoke("Get the weather forecast for New York on October 1, 2023.")
pretty_print(ai_msg)

### Example 4: Email Sending Tool
Here, an email sending tool is defined using Pydantic to ensure that all necessary fields are provided and correctly formatted. The tool requires a recipient's email address, a subject, and the body of the email. Pydantic's `EmailStr` type validates the recipient's email, enhancing the reliability of the email sending process.

In [None]:
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field, EmailStr

# Define the input schema using Pydantic
class EmailInput(BaseModel):
    recipient: EmailStr = Field(description="The email address of the recipient")
    subject: str = Field(description="The subject of the email")
    body: str = Field(description="The body of the email")

# Define the function that will be wrapped as a tool
def send_email(recipient: str, subject: str, body: str) -> str:
    """Sends an email to the specified recipient."""
    # In a real-world scenario, this function would call an email sending service
    return f"Email sent to {recipient} with subject: {subject}"

# Create the StructuredTool
email_tool = StructuredTool.from_function(
    func=send_email,
    name="send_email",
    description="Sends an email to the specified recipient.",
    args_schema=EmailInput
)

# Invoke the tool without an LLM
email_result = email_tool.invoke({"recipient": "user@example.com", "subject": "Hello", "body": "This is a test email."})
print(email_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([email_tool])
ai_msg = model_with_tools.invoke("Send an email to user@example.com with the subject 'Hello' and body 'This is a test email.'")
pretty_print(ai_msg)

### Example 5: Database Query Tool
This example showcases a database query tool where Pydantic is used to define the schema for executing SQL queries. The tool accepts a query string and a limit for the number of rows to return. Pydantic ensures that the query is a valid string and the limit is an integer, which helps in preventing SQL injection and other potential issues.

In [None]:
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
from typing import Literal

# Define the input schema using Pydantic
class QueryInput(BaseModel):
    query: str = Field(description="The SQL query to execute (without LIMIT or ORDER BY clauses)")
    limit: int = Field(description="The maximum number of rows to return", default=100)
    order_by: str = Field(description="The column to order the results by", default="id")
    order_direction: Literal["ASC", "DESC"] = Field(description="The direction to order the results (ASC or DESC)", default="ASC")

# Define the function that will be wrapped as a tool
def execute_query(query: str, limit: int, order_by: str, order_direction: str) -> str:
    """Executes a SQL query with a limit and order clause, and returns the results."""
    # Construct the full SQL query with LIMIT and ORDER BY
    full_query = f"{query} ORDER BY {order_by} {order_direction} LIMIT {limit}"
    
    # In a real-world scenario, this function would connect to a database and execute the query
    return f"Executing query: {full_query}"

# Create the StructuredTool
query_tool = StructuredTool.from_function(
    func=execute_query,
    name="execute_query",
    description="Executes a SQL query with a limit and order clause, and returns the results.",
    args_schema=QueryInput
)

# Invoke the tool without an LLM
query_result = query_tool.invoke({
    "query": "SELECT * FROM users",
    "limit": 10,
    "order_by": "name",
    "order_direction": "ASC"
})
print(query_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([query_tool])
ai_msg = model_with_tools.invoke("Execute the query 'SELECT * FROM users' with a limit of 10 rows, ordered by name in ascending order.")
pretty_print(ai_msg)

---

## Tool Schemas with TypedDict

Using `TypedDict` to define tool schemas in LangChain is another way to create structured inputs for your tools. `TypedDict` is a Python feature that allows you to define dictionaries with specific key-value pairs and types. Below are some examples of how you can define tool schemas using `TypedDict`:

### Example 1: Simple Search Tool
This example illustrates how to create a search tool using `TypedDict` for defining the input schema. The tool accepts a search query and a limit for the number of results. `TypedDict` ensures that the input dictionary contains the correct keys with appropriate types, providing a clear structure for the tool's inputs.

In [None]:
from langchain.tools import StructuredTool
from typing import TypedDict
from typing_extensions import Annotated

# Define the input schema using TypedDict with Annotated
class SearchInput(TypedDict):
    """Search for information based on the query and limit."""
    query: Annotated[str, "The search query"]
    limit: Annotated[int, "The maximum number of results to return"]

# Define the function that will be wrapped as a tool
def search(query: str, limit: int) -> str:
    """
    Searches for information based on the query and limit.
    
    Args:
        query: The search query
        limit: The maximum number of results to return
        
    Returns:
        str: Search results message
    """
    return f"Searching for: {query} with limit: {limit}"

# Create the StructuredTool
search_tool = StructuredTool.from_function(
    func=search,
    name="search",
    description="Searches for information based on the query and limit.",
)

# Case 1: Using invoke() with input parameter
result1 = search_tool.invoke(
    input={"query": "LangChain", "limit": 5}
)
print("Case 1:", result1)

# Case 2: Alternative way using run() method
result2 = search_tool.run({"query": "Python", "limit": 3})
print("Case 2:", result2)

# Case 3: Using invoke() with input as first argument
result3 = search_tool.invoke({"query": "AI", "limit": 10})
print("Case 3:", result3)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([search_tool])
ai_msg = model_with_tools.invoke("Search for LangChain with a limit of 5 results.")
pretty_print(ai_msg)

### Example 2: Calculator Tool
In this example, a calculator tool is defined using `TypedDict` to structure the inputs. The tool requires two numerical values and an operation to perform (add, subtract, multiply, divide). `TypedDict` enforces that the input dictionary contains all necessary keys with the correct types, ensuring the calculator operates reliably.

In [None]:
from langchain.tools import StructuredTool
from typing import TypedDict, Literal
from typing_extensions import Annotated

# Define the input schema using TypedDict
class CalculatorInput(TypedDict):
    num1: Annotated[float, "First number for calculation"]
    num2: Annotated[float, "Second number for calculation"]
    operation: Annotated[Literal["add", "subtract", "multiply", "divide"], "Operation to perform"]

# Define the function that will be wrapped as a tool
def calculate(num1: float, num2: float, operation: str) -> float:
    """Performs a calculation based on the provided numbers and operation."""
    if operation == "add":
        return num1 + num2
    elif operation == "subtract":
        return num1 - num2
    elif operation == "multiply":
        return num1 * num2
    elif operation == "divide":
        return num1 / num2
    else:
        raise ValueError("Invalid operation")

# Create the StructuredTool
calculator_tool = StructuredTool.from_function(
    func=calculate,
    name="calculator",
    description="Performs basic arithmetic operations.",
)

# Invoke the tool without an LLM
calc_result = calculator_tool.invoke(input={"num1": 10, "num2": 5, "operation": "add"})
print(calc_result)  # Output: 15.0

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([calculator_tool])
ai_msg = model_with_tools.invoke("Add 10 and 5.")
pretty_print(ai_msg)

### Example 3: Weather Forecast Tool
This example demonstrates how to build a weather forecast tool using `TypedDict` for input schema definition. The tool requires a location and a specific date to provide the weather forecast. `TypedDict` ensures that the input dictionary includes both the location and the date with the correct data types, facilitating accurate weather predictions.

In [None]:
from datetime import date

# Define the input schema using TypedDict
class WeatherInput(TypedDict):
    location: Annotated[str, "Location to get weather for"]
    forecast_date: Annotated[date, "Date to get forecast for"]

# Define the function that will be wrapped as a tool
def get_weather(location: str, forecast_date: date) -> str:
    """Gets the weather forecast for a specific location and date."""
    # In a real-world scenario, this function would call an external API
    return f"Weather forecast for {location} on {forecast_date}: Sunny"

# Create the StructuredTool
weather_tool = StructuredTool.from_function(
    func=get_weather,
    name="get_weather",
    description="Gets the weather forecast for a specific location and date.",
)

# Invoke the tool without an LLM
weather_result = weather_tool.invoke(input={"location": "New York", "forecast_date": date(2023, 10, 1)})
print(weather_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([weather_tool])
ai_msg = model_with_tools.invoke("Get the weather forecast for New York on October 1, 2023.")
pretty_print(ai_msg)

### Example 4: Email Sending Tool
Here, an email sending tool is defined using `TypedDict` to structure the inputs. The tool requires the recipient's email address, the subject of the email, and the body content. `TypedDict` ensures that all necessary keys are present and correctly typed, enhancing the reliability and accuracy of the email sending process.

In [None]:
# Define the input schema using TypedDict
class EmailInput(TypedDict):
    recipient: Annotated[str, "Email address to send to"]
    subject: Annotated[str, "Email subject"]
    body: Annotated[str, "Email body content"]

# Define the function that will be wrapped as a tool
def send_email(recipient: str, subject: str, body: str) -> str:
    """Sends an email to the specified recipient."""
    # In a real-world scenario, this function would call an email sending service
    return f"Email sent to {recipient} with subject: {subject}"

# Create the StructuredTool
email_tool = StructuredTool.from_function(
    func=send_email,
    name="send_email",
    description="Sends an email to the specified recipient.",
)

# Invoke the tool without an LLM
email_result = email_tool.invoke(input={"recipient": "user@example.com", "subject": "Hello", "body": "This is a test email."})
print(email_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([email_tool])
ai_msg = model_with_tools.invoke("Send an email to user@example.com with the subject 'Hello' and body 'This is a test email.'")
pretty_print(ai_msg)

### Example 5: Database Query Tool
This example showcases a database query tool where `TypedDict` is used to define the input schema for executing SQL queries. The tool accepts a query string and a limit for the number of rows to return. `TypedDict` ensures that the input dictionary includes both the query and the limit with the correct types, promoting safe and efficient database interactions.

In [None]:
class QueryInput(TypedDict):
    query: Annotated[str, "SQL query to execute"]
    limit: Annotated[int, "Maximum number of rows"]
    order_by: Annotated[str, "Column to order by"]
    order_direction: Annotated[Literal["ASC", "DESC"], "Sort direction"]

# Define the function that will be wrapped as a tool
def execute_query(query: str, limit: int, order_by: str, order_direction: str) -> str:
    """Executes a SQL query with a limit and order clause, and returns the results."""
    # Construct the full SQL query with LIMIT and ORDER BY
    full_query = f"{query} ORDER BY {order_by} {order_direction} LIMIT {limit}"
    
    # In a real-world scenario, this function would connect to a database and execute the query
    return f"Executing query: {full_query}"

# Create the StructuredTool
query_tool = StructuredTool.from_function(
    func=execute_query,
    name="execute_query",
    description="Executes a SQL query with a limit and order clause, and returns the results.",
)

# Invoke the tool without an LLM
query_result = query_tool.invoke(input={
    "query": "SELECT * FROM users",
    "limit": 10,
    "order_by": "name",
    "order_direction": "ASC"
})
print(query_result)

In [None]:
# Bind the tool to the LLM for use in chains
model_with_tools = model.bind_tools([query_tool])
ai_msg = model_with_tools.invoke("Execute the query 'SELECT * FROM users' with a limit of 10 rows, ordered by name in ascending order.")
pretty_print(ai_msg)

## Conclusion

Defining tool schemas is a foundational step in building reliable and maintainable applications with LangChain. Both **Pydantic classes** and **TypedDict** offer unique advantages for structuring tool inputs, catering to different needs and preferences.

- **Pydantic Classes** are ideal when robust data validation, default values, and complex data structures are required. Their ability to enforce strict type checks and provide detailed error messages makes them suitable for applications where data integrity is critical.

- **TypedDict**, on the other hand, offers a lightweight and straightforward approach for defining input schemas. It is best suited for simpler use cases where minimal validation is sufficient, and performance overhead needs to be minimized.

By carefully evaluating the requirements of your tools and understanding the strengths of each method, you can choose the most appropriate approach to define your tool schemas. This decision will not only enhance the reliability of your tools but also streamline their integration into larger workflows and applications within the LangChain ecosystem.