# 如何解析XML输出

:::info 预备知识

本指南假定您熟悉以下概念：
- [聊天模型](/docs/concepts/chat_models)
- [输出解析器](/docs/concepts/output_parsers)
- [提示模板](/docs/concepts/prompt_templates)
- [结构化输出](/docs/how_to/structured_output)
- [将可运行对象串联在一起](/docs/how_to/sequence/)

:::

不同供应商提供的LLM（大语言模型）通常根据它们所训练的具体数据具有不同的优势。这也意味着某些模型可能在生成JSON以外的格式输出时更"优秀"和更可靠。

本指南向您展示如何使用 [`XMLOutputParser`](https://api.js.langchain.com/classes/langchain_core.output_parsers.XMLOutputParser.html) 提示模型生成XML输出，然后将该输出解析为可用的格式。

:::{.callout-note}
请注意，大语言模型是一个有漏洞的抽象！您需要使用具有足够能力的LLM来生成格式良好的XML。
:::

在以下示例中，我们使用Anthropic的Claude（https://docs.anthropic.com/claude/docs），这是一个针对XML标签优化的模型。

```{=mdx}
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
import Npm2Yarn from "@theme/Npm2Yarn";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

<Npm2Yarn>
  @langchain/anthropic @langchain/core
</Npm2Yarn>
```

让我们从对模型的一个简单请求开始。

In [3]:
import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
  maxTokens: 512,
  temperature: 0.1,
});

const query = `Generate the shortened filmograph for Tom Hanks.`;

const result = await model.invoke(query + ` Please enclose the movies in "movie" tags.`);

console.log(result.content);

Here is the shortened filmography for Tom Hanks, with movies enclosed in "movie" tags:

<movie>Forrest Gump</movie>
<movie>Saving Private Ryan</movie>
<movie>Cast Away</movie>
<movie>Apollo 13</movie>
<movie>Catch Me If You Can</movie>
<movie>The Green Mile</movie>
<movie>Toy Story</movie>
<movie>Toy Story 2</movie>
<movie>Toy Story 3</movie>
<movie>Toy Story 4</movie>
<movie>Philadelphia</movie>
<movie>Big</movie>
<movie>Sleepless in Seattle</movie>
<movie>You've Got Mail</movie>
<movie>The Terminal</movie>


这实际上运行得相当不错！不过，将该XML解析为更易于使用的格式会更好。我们可以使用`XMLOutputParser`，它既可以向提示中添加默认格式说明，又能将输出的XML解析为字典：

In [4]:
import { XMLOutputParser } from "@langchain/core/output_parsers";

// We will add these instructions to the prompt below
const parser = new XMLOutputParser();

parser.getFormatInstructions();

[32m"The output should be formatted as a XML file.\n"[39m +
  [32m"1. Output should conform to the tags below. \n"[39m +
  [32m"2. If tag"[39m... 434 more characters

In [7]:
import { ChatPromptTemplate } from "@langchain/core/prompts";

const prompt = ChatPromptTemplate.fromTemplate(`{query}\n{format_instructions}`);
const partialedPrompt = await prompt.partial({
  format_instructions: parser.getFormatInstructions(),
});

const chain = partialedPrompt.pipe(model).pipe(parser);

const output = await chain.invoke({
  query: "Generate the shortened filmograph for Tom Hanks.",
});

console.log(JSON.stringify(output, null, 2));

{
  "filmography": [
    {
      "actor": [
        {
          "name": "Tom Hanks"
        },
        {
          "films": [
            {
              "film": [
                {
                  "title": "Forrest Gump"
                },
                {
                  "year": "1994"
                },
                {
                  "role": "Forrest Gump"
                }
              ]
            },
            {
              "film": [
                {
                  "title": "Saving Private Ryan"
                },
                {
                  "year": "1998"
                },
                {
                  "role": "Captain Miller"
                }
              ]
            },
            {
              "film": [
                {
                  "title": "Cast Away"
                },
                {
                  "year": "2000"
                },
                {
                  "role": "Chuck Noland"
                }
              

您会注意到上面的输出不再只是包含在 `movie` 标签之间。我们还可以添加一些标签，以根据我们的需求定制输出：

In [8]:
const parserWithTags = new XMLOutputParser({ tags: ["movies", "actor", "film", "name", "genre"] });

// We will add these instructions to the prompt below
parserWithTags.getFormatInstructions();

[32m"The output should be formatted as a XML file.\n"[39m +
  [32m"1. Output should conform to the tags below. \n"[39m +
  [32m"2. If tag"[39m... 460 more characters

您可以而且应该尝试在提示的其他部分添加自己的格式化提示，以增强或替换默认指令。

这是我们调用它时的结果：

In [9]:
import { ChatPromptTemplate } from "@langchain/core/prompts";

const promptWithTags = ChatPromptTemplate.fromTemplate(`{query}\n{format_instructions}`);
const partialedPromptWithTags = await promptWithTags.partial({
  format_instructions: parserWithTags.getFormatInstructions(),
});

const chainWithTags = partialedPromptWithTags.pipe(model).pipe(parserWithTags);

const outputWithTags = await chainWithTags.invoke({
  query: "Generate the shortened filmograph for Tom Hanks.",
});

console.log(JSON.stringify(outputWithTags, null, 2));

{
  "movies": [
    {
      "actor": [
        {
          "film": [
            {
              "name": "Forrest Gump"
            },
            {
              "genre": "Drama"
            }
          ]
        },
        {
          "film": [
            {
              "name": "Saving Private Ryan"
            },
            {
              "genre": "War"
            }
          ]
        },
        {
          "film": [
            {
              "name": "Cast Away"
            },
            {
              "genre": "Drama"
            }
          ]
        },
        {
          "film": [
            {
              "name": "Catch Me If You Can"
            },
            {
              "genre": "Biography"
            }
          ]
        },
        {
          "film": [
            {
              "name": "The Terminal"
            },
            {
              "genre": "Comedy-drama"
            }
          ]
        }
      ]
    }
  ]
}


## 下一步

你现在已经学会了如何提示模型返回 XML。接下来，查看关于获取结构化输出的[更全面指南](/docs/how_to/structured_output)，了解更多相关技术。