# 如何从模型中返回结构化数据
```{=mdx}
<span data-heading-keywords="with_structured_output"></span>
```

让模型返回符合特定模式的输出通常非常有用。一个常见的使用场景是从任意文本中提取数据，以便插入到传统数据库中或用于其他下游系统。本指南将向你展示几种可以使用的不同策略。

:::info 前提条件

本指南假定你已熟悉以下概念：

- [聊天模型](/docs/concepts/chat_models)

:::

## `.withStructuredOutput()` 方法

模型在底层可以使用多种策略。对于一些最流行的模型提供商，包括 [Anthropic](/docs/integrations/platforms/anthropic/)、[Google VertexAI](/docs/integrations/platforms/google/)、[Mistral](/docs/integrations/chat/mistral/) 和 [OpenAI](/docs/integrations/platforms/openai/)，LangChain 实现了一个通用的接口，抽象了这些策略，称为 `.withStructuredOutput`。

通过调用此方法（并传入 [JSON schema](https://json-schema.org/) 或 [Zod schema](https://zod.dev/)），模型将自动添加必要的模型参数和输出解析器，以获得符合请求模式的结构化输出。如果模型支持多种实现方式（例如，函数调用与 JSON 模式），你可以通过传入相应方法来配置使用哪种方式。

让我们看一些实际示例！我们将使用 Zod 创建一个简单的响应模式。

```{=mdx}
import ChatModelTabs from "@theme/ChatModelTabs";

<ChatModelTabs onlyWso={true} />
```

In [1]:
import { z } from "zod";

const joke = z.object({
  setup: z.string().describe("The setup of the joke"),
  punchline: z.string().describe("The punchline to the joke"),
  rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});

const structuredLlm = model.withStructuredOutput(joke);

await structuredLlm.invoke("Tell me a joke about cats")

{
  setup: [32m"Why don't cats play poker in the wild?"[39m,
  punchline: [32m"Too many cheetahs."[39m,
  rating: [33m7[39m
}

一个关键点是，尽管我们将Zod模式设置为名为`joke`的变量，但Zod无法访问该变量名，因此无法将其传递给模型。虽然这不是必需的，但我们可以为模式传递一个名称，以便向模型提供更多关于该模式所代表内容的上下文，从而提升性能：

In [2]:
const structuredLlm = model.withStructuredOutput(joke, { name: "joke" });

await structuredLlm.invoke("Tell me a joke about cats")

{
  setup: [32m"Why don't cats play poker in the wild?"[39m,
  punchline: [32m"Too many cheetahs!"[39m,
  rating: [33m7[39m
}

结果是一个JSON对象。

如果你不想使用Zod，也可以传入一个OpenAI风格的JSON模式字典。该对象应包含三个属性：

- `name`：要输出的模式的名称。
- `description`：对要输出的模式的高层描述。
- `parameters`：你想要提取的模式的嵌套细节，格式为[JSON模式](https://json-schema.org/)字典。

在这种情况下，响应也是一个字典：

In [3]:
const structuredLlm = model.withStructuredOutput(
  {
    "name": "joke",
    "description": "Joke to tell user.",
    "parameters": {
      "title": "Joke",
      "type": "object",
      "properties": {
        "setup": {"type": "string", "description": "The setup for the joke"},
        "punchline": {"type": "string", "description": "The joke's punchline"},
      },
      "required": ["setup", "punchline"],
    },
  }
)

await structuredLlm.invoke("Tell me a joke about cats", { name: "joke" })

{
  setup: [32m"Why was the cat sitting on the computer?"[39m,
  punchline: [32m"Because it wanted to keep an eye on the mouse!"[39m
}

如果你使用 JSON Schema，可以利用其他更复杂的模式描述来实现类似的效果。

如果所选模型支持，你也可以直接使用工具调用，让模型在不同选项间进行选择。这需要更多的解析和设置工作。详见[此操作指南](/docs/how_to/tool_calling/)。

### 指定输出方式（高级）

对于支持多种数据输出方式的模型，你可以按如下方式指定首选的输出方式：

In [6]:
const structuredLlm = model.withStructuredOutput(joke, {
  method: "json_mode",
  name: "joke",
})

await structuredLlm.invoke(
  "Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys"
)

{
  setup: [32m"Why don't cats play poker in the jungle?"[39m,
  punchline: [32m"Too many cheetahs!"[39m
}

在上面的例子中，我们使用了OpenAI的替代JSON模式功能，并结合了一个更具体的提示。

关于你选择的模型的具体细节，请查阅其在[API参考页面](https://api.js.langchain.com/)中的条目。

### （高级）原始输出

LLM在生成结构化输出方面并非完美，特别是当模式变得复杂时。你可以通过传递`includeRaw: true`来避免抛出异常并自行处理原始输出。这将改变输出格式，使其包含原始消息输出和`parsed`值（如果解析成功）：

In [2]:
const joke = z.object({
  setup: z.string().describe("The setup of the joke"),
  punchline: z.string().describe("The punchline to the joke"),
  rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});

const structuredLlm = model.withStructuredOutput(joke, { includeRaw: true, name: "joke" });

await structuredLlm.invoke("Tell me a joke about cats");

{
  raw: AIMessage {
    lc_serializable: [33mtrue[39m,
    lc_kwargs: {
      content: [32m""[39m,
      tool_calls: [
        {
          name: [32m"joke"[39m,
          args: [36m[Object][39m,
          id: [32m"call_0pEdltlfSXjq20RaBFKSQOeF"[39m
        }
      ],
      invalid_tool_calls: [],
      additional_kwargs: { function_call: [90mundefined[39m, tool_calls: [ [36m[Object][39m ] },
      response_metadata: {}
    },
    lc_namespace: [ [32m"langchain_core"[39m, [32m"messages"[39m ],
    content: [32m""[39m,
    name: [90mundefined[39m,
    additional_kwargs: {
      function_call: [90mundefined[39m,
      tool_calls: [
        {
          id: [32m"call_0pEdltlfSXjq20RaBFKSQOeF"[39m,
          type: [32m"function"[39m,
          function: [36m[Object][39m
        }
      ]
    },
    response_metadata: {
      tokenUsage: { completionTokens: [33m33[39m, promptTokens: [33m88[39m, totalTokens: [33m121[39m },
      finish_reason: [32m"stop"[

## 提示技术

你还可以提示模型以特定格式输出信息。这种方法依赖于设计良好的提示，并随后解析模型的输出。对于不支持 `.with_structured_output()` 或其他内置方法的模型，这是唯一的选择。

### 使用 `JsonOutputParser`

以下示例使用内置的 [`JsonOutputParser`](https://api.js.langchain.com/classes/langchain_core.output_parsers.JsonOutputParser.html) 来解析聊天模型的输出，该模型被提示以匹配给定的 JSON Schema。请注意，我们正在通过解析器上的一个方法，将 `format_instructions` 直接添加到提示中：

In [7]:
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";

type Person = {
    name: string;
    height_in_meters: number;
};

type People = {
    people: Person[];
};

const formatInstructions = `Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}

Where people is an array of objects, each with a name and height_in_meters field.
`

// Set up a parser
const parser = new JsonOutputParser<People>();

// Prompt
const prompt = await ChatPromptTemplate.fromMessages(
    [
        [
            "system",
            "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
        ],
        [
            "human",
            "{query}",
        ]
    ]
).partial({
    format_instructions: formatInstructions,
})

让我们看看发送给模型的信息是什么：

In [8]:
const query = "Anna is 23 years old and she is 6 feet tall"

console.log((await prompt.format({ query })).toString())

System: Answer the user query. Wrap the output in `json` tags
Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}

Where people is an array of objects, each with a name and height_in_meters field.

Human: Anna is 23 years old and she is 6 feet tall


现在让我们调用它：

In [9]:
const chain = prompt.pipe(model).pipe(parser);

await chain.invoke({ query })

{ people: [ { name: [32m"Anna"[39m, height_in_meters: [33m1.83[39m } ] }

如需深入了解如何使用输出解析器配合提示技术生成结构化输出，请参阅[本指南](/docs/how_to/output_parser_structured)。

### 自定义解析

您还可以使用[LangChain 表达式语言 (LCEL)](/docs/concepts/lcel) 创建自定义提示和解析器，通过普通函数来解析模型的输出：

In [10]:
import { AIMessage } from "@langchain/core/messages";
import { ChatPromptTemplate } from "@langchain/core/prompts";

type Person = {
    name: string;
    height_in_meters: number;
};

type People = {
    people: Person[];
};

const schema = `{{ people: [{{ name: "string", height_in_meters: "number" }}] }}`

// Prompt
const prompt = await ChatPromptTemplate.fromMessages(
    [
        [
            "system",
            `Answer the user query. Output your answer as JSON that
matches the given schema: \`\`\`json\n{schema}\n\`\`\`.
Make sure to wrap the answer in \`\`\`json and \`\`\` tags`
        ],
        [
            "human",
            "{query}",
        ]
    ]
).partial({
    schema
});

/**
 * Custom extractor
 * 
 * Extracts JSON content from a string where
 * JSON is embedded between ```json and ``` tags.
 */
const extractJson = (output: AIMessage): Array<People> => {
    const text = output.content as string;
    // Define the regular expression pattern to match JSON blocks
    const pattern = /```json(.*?)```/gs;

    // Find all non-overlapping matches of the pattern in the string
    const matches = text.match(pattern);

    // Process each match, attempting to parse it as JSON
    try {
        return matches?.map(match => {
            // Remove the markdown code block syntax to isolate the JSON string
            const jsonStr = match.replace(/```json|```/g, '').trim();
            return JSON.parse(jsonStr);
        }) ?? [];
    } catch (error) {
        throw new Error(`Failed to parse: ${output}`);
    }
}

这是发送给模型的提示：

In [11]:
const query = "Anna is 23 years old and she is 6 feet tall"

console.log((await prompt.format({ query })).toString())

System: Answer the user query. Output your answer as JSON that
matches the given schema: ```json
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}
```.
Make sure to wrap the answer in ```json and ``` tags
Human: Anna is 23 years old and she is 6 feet tall


调用它时的效果如下：

In [12]:
import { RunnableLambda } from "@langchain/core/runnables";

const chain = prompt.pipe(model).pipe(new RunnableLambda({ func: extractJson }));

await chain.invoke({ query })

[
  { people: [ { name: [32m"Anna"[39m, height_in_meters: [33m1.83[39m } ] }
]

## 下一步

现在你已经学习了几种让模型输出结构化数据的方法。

如需进一步学习，请查看本节中的其他操作指南或关于工具调用的概念指南。