# 🤝 trustcall：更可靠的结构化输出

LLM 在被要求生成或修改大型 JSON 块时会有所挣扎。 [trustcall](https://github.com/hinthornw/trustcall) 通过要求 LLM 生成 [JSON Patch](https://www.genspark.ai/spark/understanding-json-patch/d07b30ec-813e-47da-8d1c-9a36923501f1) 操作来解决这个问题。这使能够：

- ⚡ 更快更便宜的生成结构化输出
- 🐶 对验证错误的弹性重试，即使是复杂、嵌套的结构（定义为 pydantic、结构字典或常规 Python 函数）
- 🧩 对现有结构的准确更新，避免不希望的删除

trustcall 的核心原理如下图所示：
![](https://github.com/hinthornw/trustcall/raw/main/_static/cover.png)

In [8]:
!pip install trustcall

Collecting trustcall
  Downloading trustcall-0.0.22-py3-none-any.whl.metadata (29 kB)
Collecting dydantic<0.0.8,>=0.0.7 (from trustcall)
  Downloading dydantic-0.0.7-py3-none-any.whl.metadata (3.6 kB)
Downloading trustcall-0.0.22-py3-none-any.whl (22 kB)
Downloading dydantic-0.0.7-py3-none-any.whl (8.6 kB)
Installing collected packages: dydantic, trustcall
Successfully installed dydantic-0.0.7 trustcall-0.0.22

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 处理复杂数据结构

首先我们给定一个较为复杂的、嵌套的数据结构，并通过传统的方式来尝试实现结构化输出。

In [2]:
from typing import List, Optional

from pydantic import BaseModel


class OutputFormat(BaseModel):
    preference: str
    sentence_preference_revealed: str


class TelegramPreferences(BaseModel):
    preferred_encoding: Optional[List[OutputFormat]] = None
    favorite_telegram_operators: Optional[List[OutputFormat]] = None
    preferred_telegram_paper: Optional[List[OutputFormat]] = None


class MorseCode(BaseModel):
    preferred_key_type: Optional[List[OutputFormat]] = None
    favorite_morse_abbreviations: Optional[List[OutputFormat]] = None


class Semaphore(BaseModel):
    preferred_flag_color: Optional[List[OutputFormat]] = None
    semaphore_skill_level: Optional[List[OutputFormat]] = None


class TrustFallPreferences(BaseModel):
    preferred_fall_height: Optional[List[OutputFormat]] = None
    trust_level: Optional[List[OutputFormat]] = None
    preferred_catching_technique: Optional[List[OutputFormat]] = None


class CommunicationPreferences(BaseModel):
    telegram: TelegramPreferences
    morse_code: MorseCode
    semaphore: Semaphore


class UserPreferences(BaseModel):
    communication_preferences: CommunicationPreferences
    trust_fall_preferences: TrustFallPreferences


class TelegramAndTrustFallPreferences(BaseModel):
    pertinent_user_preferences: UserPreferences

In [5]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
bound = llm.with_structured_output(TelegramAndTrustFallPreferences)

conversation = """Operator: How may I assist with your telegram, sir?
Customer: I need to send a message about our trust fall exercise.
Operator: Certainly. Morse code or standard encoding?
Customer: Morse, please. I love using a straight key.
Operator: Excellent. What's your message?
Customer: Tell him I'm ready for a higher fall, and I prefer the diamond formation for catching.
Operator: Done. Shall I use our "Daredevil" paper for this daring message?
Customer: Perfect! Send it by your fastest carrier pigeon.
Operator: It'll be there within the hour, sir."""

bound.invoke(f"""Extract the preferences from the following conversation:
<convo>
{conversation}
</convo>""")

TelegramAndTrustFallPreferences(pertinent_user_preferences=UserPreferences(communication_preferences=CommunicationPreferences(telegram=TelegramPreferences(preferred_encoding=None, favorite_telegram_operators=None, preferred_telegram_paper=[OutputFormat(preference='Daredevil', sentence_preference_revealed='Shall I use our "Daredevil" paper for this daring message?')]), morse_code=MorseCode(preferred_key_type=[OutputFormat(preference='straight key', sentence_preference_revealed='I love using a straight key.')], favorite_morse_abbreviations=None), semaphore=Semaphore(preferred_flag_color=None, semaphore_skill_level=None)), trust_fall_preferences=TrustFallPreferences(preferred_fall_height=[OutputFormat(preference='higher', sentence_preference_revealed="Tell him I'm ready for a higher fall.")], trust_level=None, preferred_catching_technique=[OutputFormat(preference='diamond formation', sentence_preference_revealed='I prefer the diamond formation for catching.')])))

👆 LangSmith Trace: https://smith.langchain.com/public/62752b98-91a4-48f3-a86f-043f284d324a/r

In [10]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
bound = llm.with_structured_output(TelegramAndTrustFallPreferences)

conversation = """Operator: How may I assist with your telegram, sir?
Customer: I need to send a message about our trust fall exercise.
Operator: Certainly. Morse code or standard encoding?
Customer: Morse, please. I love using a straight key.
Operator: Excellent. What's your message?
Customer: Tell him I'm ready for a higher fall, and I prefer the diamond formation for catching.
Operator: Done. Shall I use our "Daredevil" paper for this daring message?
Customer: Perfect! Send it by your fastest carrier pigeon.
Operator: It'll be there within the hour, sir."""

bound.invoke(f"""Extract the preferences from the following conversation:
<convo>
{conversation}
</convo>""")

TelegramAndTrustFallPreferences(pertinent_user_preferences=UserPreferences(communication_preferences=CommunicationPreferences(telegram=TelegramPreferences(preferred_encoding=[OutputFormat(preference='Morse', sentence_preference_revealed='I need to send a message about our trust fall exercise.')], favorite_telegram_operators=[OutputFormat(preference='Daredevil', sentence_preference_revealed='Shall I use our "Daredevil" paper for this daring message?')], preferred_telegram_paper=[OutputFormat(preference='Daredevil', sentence_preference_revealed='Shall I use our "Daredevil" paper for this daring message?')]), morse_code=MorseCode(preferred_key_type=[OutputFormat(preference='Straight Key', sentence_preference_revealed='I love using a straight key.')], favorite_morse_abbreviations=None), semaphore=Semaphore(preferred_flag_color=None, semaphore_skill_level=None)), trust_fall_preferences=TrustFallPreferences(preferred_fall_height=[OutputFormat(preference='Higher Fall', sentence_preference_rev

In [11]:
from trustcall import create_extractor

bound = create_extractor(
    llm,
    tools=[TelegramAndTrustFallPreferences],
    tool_choice="TelegramAndTrustFallPreferences",
)

result = bound.invoke(
    f"""Extract the preferences from the following conversation:
<convo>
{conversation}
</convo>"""
)
result["responses"][0]

TelegramAndTrustFallPreferences(pertinent_user_preferences=UserPreferences(communication_preferences=CommunicationPreferences(telegram=TelegramPreferences(preferred_encoding=[OutputFormat(preference='morse', sentence_preference_revealed='I prefer Morse code.')], favorite_telegram_operators=None, preferred_telegram_paper=[OutputFormat(preference='Daredevil', sentence_preference_revealed='I prefer the Daredevil paper.')]), morse_code=MorseCode(preferred_key_type=[OutputFormat(preference='straight key', sentence_preference_revealed='I love using a straight key.')], favorite_morse_abbreviations=None), semaphore=Semaphore(preferred_flag_color=None, semaphore_skill_level=None)), trust_fall_preferences=TrustFallPreferences(preferred_fall_height=[OutputFormat(preference='higher', sentence_preference_revealed='I am ready for a higher fall.')], trust_level=None, preferred_catching_technique=[OutputFormat(preference='diamond formation', sentence_preference_revealed='I prefer the diamond formation

👆 LangSmith Trace: https://smith.langchain.com/public/a80f3a38-1e40-4b79-b04a-a9aa69947157/r

## 处理数据结构更新

许多任务期望使用 LLM 根据新信息更正或修改现有对象。

以内存管理为例。假设您将内存结构化为 JSON 对象。当提供新信息时，LLM 必须将此信息与现有文档进行协调。让我们尝试使用文档的简单再生来试试。我们将内存建模为单个用户资料：

In [1]:
from typing import Dict, List, Optional

from pydantic import BaseModel


class Address(BaseModel):
    street: str
    city: str
    country: str
    postal_code: str


class Pet(BaseModel):
    kind: str
    name: Optional[str]
    age: Optional[int]


class Hobby(BaseModel):
    name: str
    skill_level: str
    frequency: str


class FavoriteMedia(BaseModel):
    shows: List[str]
    movies: List[str]
    books: List[str]


class User(BaseModel):
    preferred_name: str
    favorite_media: FavoriteMedia
    favorite_foods: List[str]
    hobbies: List[Hobby]
    age: int
    occupation: str
    address: Address
    favorite_color: Optional[str] = None
    pets: Optional[List[Pet]] = None
    languages: Dict[str, str] = {}

In [2]:
initial_user = User(
    preferred_name="Alex",
    favorite_media=FavoriteMedia(
        shows=[
            "Friends",
            "Game of Thrones",
            "Breaking Bad",
            "The Office",
            "Stranger Things",
        ],
        movies=["The Shawshank Redemption", "Inception", "The Dark Knight"],
        books=["1984", "To Kill a Mockingbird", "The Great Gatsby"],
    ),
    favorite_foods=["sushi", "pizza", "tacos", "ice cream", "pasta", "curry"],
    hobbies=[
        Hobby(name="reading", skill_level="expert", frequency="daily"),
        Hobby(name="hiking", skill_level="intermediate", frequency="weekly"),
        Hobby(name="photography", skill_level="beginner", frequency="monthly"),
        Hobby(name="biking", skill_level="intermediate", frequency="weekly"),
        Hobby(name="swimming", skill_level="expert", frequency="weekly"),
        Hobby(name="canoeing", skill_level="beginner", frequency="monthly"),
        Hobby(name="sailing", skill_level="intermediate", frequency="monthly"),
        Hobby(name="weaving", skill_level="beginner", frequency="weekly"),
        Hobby(name="painting", skill_level="intermediate", frequency="weekly"),
        Hobby(name="cooking", skill_level="expert", frequency="daily"),
    ],
    age=28,
    occupation="Software Engineer",
    address=Address(
        street="123 Tech Lane", city="San Francisco", country="USA", postal_code="94105"
    ),
    favorite_color="blue",
    pets=[Pet(kind="cat", name="Luna", age=3)],
    languages={"English": "native", "Spanish": "intermediate", "Python": "expert"},
)

In [14]:
conversation = """Friend: Hey Alex, how's the new job going? I heard you switched careers recently.
Alex: It's going great! I'm loving my new role as a Data Scientist. The work is challenging but exciting. I've moved to a new apartment in New York to be closer to the office.
Friend: That's a big change! Are you still finding time for your hobbies?
Alex: Well, I've had to cut back on some. I'm not doing much sailing or canoeing these days. But I've gotten really into machine learning projects in my free time. I'd say I'm getting pretty good at it - probably an intermediate level now.
Friend: Sounds like you're keeping busy! How's Luna doing?
Alex: Oh, Luna's great. She just turned 4 last week. She's actually made friends with my new pet, Max the dog. He's a playful 2-year-old golden retriever.
Friend: Two pets now! That's exciting. Hey, want to catch the new season of Stranger Things this weekend?
Alex: Actually, I've kind of lost interest in that show. But I'm really into this new series called "The Mandalorian". We could watch that instead! Oh, and I recently watched "Parasite" - it's become one of my favorite movies.
Friend: Sure, that sounds fun. Should I bring some food? I remember you love sushi.
Alex: Sushi would be perfect! Or maybe some Thai food - I've been really into that lately. By the way, I've been practicing my French. I'd say I'm at a beginner level now.
Friend: That's great! You're always learning something new. How's the cooking going?
Alex: It's going well! I've been cooking almost every day now. I'd say I've become quite proficient at it."""


# Naive approach
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
bound = llm.with_structured_output(User)
naive_result = bound.invoke(
    f"""Update the memory (JSON doc) to incorporate new information from the following conversation:
<user_info>
{initial_user.model_dump()}
</user_info>
<convo>
{conversation}
</convo>"""
)
print("Naive approach result:")
naive_output = naive_result.model_dump()
print(naive_output)

Naive approach result:
{'preferred_name': 'Alex', 'favorite_media': {'shows': ['Friends', 'Game of Thrones', 'Breaking Bad', 'The Office', 'The Mandalorian'], 'movies': ['The Shawshank Redemption', 'Inception', 'The Dark Knight', 'Parasite'], 'books': ['1984', 'To Kill a Mockingbird', 'The Great Gatsby']}, 'favorite_foods': ['sushi', 'pizza', 'tacos', 'ice cream', 'pasta', 'curry', 'Thai food'], 'hobbies': [{'name': 'reading', 'skill_level': 'expert', 'frequency': 'daily'}, {'name': 'hiking', 'skill_level': 'intermediate', 'frequency': 'weekly'}, {'name': 'photography', 'skill_level': 'beginner', 'frequency': 'monthly'}, {'name': 'biking', 'skill_level': 'intermediate', 'frequency': 'weekly'}, {'name': 'swimming', 'skill_level': 'expert', 'frequency': 'weekly'}, {'name': 'sailing', 'skill_level': 'intermediate', 'frequency': 'monthly'}, {'name': 'weaving', 'skill_level': 'beginner', 'frequency': 'weekly'}, {'name': 'painting', 'skill_level': 'intermediate', 'frequency': 'weekly'}, {'na

In [15]:
# Trustcall approach
from trustcall import create_extractor

bound = create_extractor(llm, tools=[User])

trustcall_result = bound.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": f"""Update the memory (JSON doc) to incorporate new information from the following conversation:
<convo>
{conversation}
</convo>""",
            }
        ],
        "existing": {"User": initial_user.model_dump()},
    }
)
print("\nTrustcall approach result:")
trustcall_output = trustcall_result["responses"][0].model_dump()
print(trustcall_output)

Could not apply patch: can't replace a non-existent object 'French'

Trustcall approach result:


IndexError: list index out of range

## 处理同时插入和更新

上述两个问题（在生成复杂模式时类型安全的生成困难以及生成现有模式的正确编辑困难）在您必须提示 LLM 同时处理更新和插入时更加复杂，这在从对话中提取多个内存“事件”时通常是情况。

In [16]:
import uuid
from typing import List, Optional

from pydantic import BaseModel, Field


class Person(BaseModel):
    """Someone the user knows or interacts with."""

    name: str
    relationship: str = Field(description="How they relate to the user.")

    notes: List[str] = Field(
        description="Memories and other observations about the person"
    )


# Initial data
initial_people = [
    Person(
        name="Emma Thompson",
        relationship="College friend",
        notes=["Loves hiking", "Works in marketing", "Has a dog named Max"],
    ),
    Person(
        name="Michael Chen",
        relationship="Coworker",
        notes=["Great at problem-solving", "Vegetarian", "Plays guitar"],
    ),
    Person(
        name="Sarah Johnson",
        relationship="Neighbor",
        notes=["Has two kids", "Loves gardening", "Makes amazing cookies"],
    ),
]

# Convert to the format expected by the extractor
existing_data = [
    (str(i), "Person", person.model_dump()) for i, person in enumerate(initial_people)
]

In [19]:
conversation = """
Me: I ran into Emma Thompson at the park yesterday. She was walking her new puppy, a golden retriever named Sunny. She mentioned she got promoted to Senior Marketing Manager last month.
Friend: That's great news for Emma! How's she enjoying the new role?
Me: She seems to be thriving. Oh, and did you know she's taken up rock climbing? She invited me to join her at the climbing gym sometime.
Friend: Wow, rock climbing? That's quite a change from hiking. Speaking of friends, have you heard from Michael Chen recently?
Me: Actually, yes. We had a video call last week. He's switched jobs and is now working as a Data Scientist at a startup. He's also mentioned he's thinking of going vegan.
Friend: That's a big change for Michael! Both career and diet-wise. How about your neighbor, Sarah? Is she still teaching?
Me: Sarah's doing well. Her kids are growing up fast - her oldest just started middle school. She's still teaching, but now she's focusing on special education. She's really passionate about it.
Friend: That's wonderful. Oh, before I forget, I wanted to introduce you to my cousin who just moved to town. Her name is Olivia Davis, she's a 27-year-old graphic designer. She's looking to meet new people and expand her social circle. I thought you two might get along well.
Me: That sounds great! I'd love to meet her. Maybe we could all get together for coffee next week?
Friend: Perfect! I'll set it up. Olivia loves art and is always sketching in her free time. She also volunteers at the local animal shelter on weekends.
"""

from langchain_openai import ChatOpenAI

# Now, let's use the extractor to update existing entries and create new ones
from trustcall import create_extractor

llm = ChatOpenAI(model="gpt-4o-mini")

extractor = create_extractor(
    llm,
    tools=[Person],
    tool_choice="any",
    enable_inserts=True,
)

result = extractor.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": f"Update existing person records and create new ones based on the following conversation:\n\n{conversation}",
            }
        ],
        "existing": existing_data,
    }
)

# Print the results
print("Updated and new person records:")
for r, rmeta in zip(result["responses"], result["response_metadata"]):
    print(f"ID: {rmeta.get('json_doc_id', 'New')}")
    print(r.model_dump_json(indent=2))
    print()

Updated and new person records:
ID: 0
{
  "name": "Emma Thompson",
  "relationship": "College friend",
  "notes": [
    "Loves hiking",
    "Works in marketing",
    "Has a dog named Sunny",
    "Got promoted to Senior Marketing Manager",
    "Took up rock climbing"
  ]
}

ID: 1
{
  "name": "Michael Chen",
  "relationship": "Coworker",
  "notes": [
    "Great at problem-solving",
    "Thinking of going vegan",
    "Switched jobs to Data Scientist at a startup"
  ]
}

ID: 2
{
  "name": "Sarah Johnson",
  "relationship": "Neighbor",
  "notes": [
    "Her oldest just started middle school",
    "Loves gardening",
    "Makes amazing cookies",
    "Focusing on special education"
  ]
}

ID: New
{
  "name": "Olivia Davis",
  "relationship": "Cousin of a friend",
  "notes": [
    "27-year-old graphic designer",
    "Loves art and sketching",
    "Volunteers at the local animal shelter on weekends"
  ]
}



👆 LangSmith Trace: https://smith.langchain.com/public/50b8479f-4753-4d4a-b8da-f21e0f658914/r

## 在 LangGraph 中的使用

In [51]:
import operator
from datetime import datetime
from typing import List

import pytz
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition
from pydantic.v1 import BaseModel, Field, validator
from trustcall import create_extractor
from typing_extensions import Annotated, TypedDict


class Preferences(BaseModel):
    foods: List[str] = Field(description="Favorite foods")

    @validator("foods")
    def at_least_three_foods(cls, v):
        if len(v) < 3:
            raise ValueError("Must have at least three favorite foods")
        return v


llm = ChatOpenAI(model="gpt-4o-mini")


def save_user_information(preferences: Preferences):
    """Save user information to a database."""
    return "User information saved"


def lookup_time(tz: str) -> str:
    """Lookup the current time in a given timezone."""
    try:
        # Convert the timezone string to a timezone object
        timezone = pytz.timezone(tz)
        # Get the current time in the given timezone
        tm = datetime.now(timezone)
        return f"The current time in {tz} is {tm.strftime('%H:%M:%S')}"
    except pytz.UnknownTimeZoneError:
        return f"Unknown timezone: {tz}"


agent = create_extractor(llm, tools=[save_user_information, lookup_time])


class State(TypedDict):
    messages: Annotated[list, operator.add]


builder = StateGraph(State)
builder.add_node("agent", agent)
builder.add_node("tools", ToolNode([save_user_information, lookup_time]))
builder.add_edge("tools", "agent")
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", tools_condition)

graph = builder.compile(checkpointer=MemorySaver())


In [40]:
config = {"configurable": {"thread_id": "1234"}}
res = graph.invoke({"messages": [("user", "Hi there!")]}, config)
res["messages"][-1].pretty_print()


Hello! How can I assist you today?


In [41]:
res = graph.invoke(
    {"messages": [("user", "Curious; what's the time in denver right now?")]}, config
)
res["messages"][-1].pretty_print()


The current time in Denver is 3:29 AM.


In [52]:
res = graph.invoke(
    {
        "messages": [
            ("user", "Did you know my favorite foods are spinach and potatoes?")
        ]
    },
    config,
)
res["messages"][-1].pretty_print()

ValidationError: 1 validation error for save_user_information
preferences -> foods
  Must have at least three favorite foods (type=value_error)

👆 LangSmith Trace: https://smith.langchain.com/public/41407f66-7ed4-4870-8fa6-7beca3c5e90f/r

🌰 Fireworks Firefunction v2: https://smith.langchain.com/public/b83d6db1-ffb9-4817-a166-bbc5004bbc25/r/5a05f73b-1d7e-47d4-9e40-0e8aaa3faa28 

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=e6124301-0a3c-4c43-a70b-884a597351fa' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>