# 如何仅使用提示（无需工具调用）进行提取

生成结构化输出**无需**使用 LLM 的工具调用功能。能够很好地遵循提示指令的 LLM，可以被指示以给定格式输出信息。

此方法依赖于设计良好的提示，然后解析 LLM 的输出来让它们很好地提取信息。

要提取数据而不使用工具调用功能：

1. 指示 LLM 按照预期的格式生成文本（例如，具有特定架构的 JSON）；
2. 使用 [输出解析器](/docs/concepts/output_parsers) 将模型响应构造为所需的 Python 对象。

首先我们选择一个 LLM：

import ChatModelTabs from "@theme/ChatModelTabs";

<ChatModelTabs customVarName="model" />

In [1]:
# | output: false
# | echo: false

from langchain_anthropic.chat_models import ChatAnthropic

model = ChatAnthropic(model_name="claude-3-sonnet-20240229", temperature=0)

:::tip
本教程旨在简单易懂，但总体来说，应包含参考示例以最大限度地提高性能！
:::

## 使用 PydanticOutputParser

以下示例使用内置的 `PydanticOutputParser` 来解析聊天模型的输出。

In [2]:
from typing import List, Optional

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field, validator


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Set up a parser
parser = PydanticOutputParser(pydantic_object=People)

# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
        ),
        ("human", "{query}"),
    ]
).partial(format_instructions=parser.get_format_instructions())

让我们看看模型接收了哪些信息

In [3]:
query = "Anna is 23 years old and she is 6 feet tall"

In [4]:
print(prompt.format_prompt(query=query).to_string())

System: Answer the user query. Wrap the output in `json` tags
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"Person": {"description": "Information about a person.", "properties": {"name": {"description": "The name of the person", "title": "Name", "type": "string"}, "height_in_meters": {"description": "The height of the person expressed in meters.", "title": "Height In Meters", "type": "number"}}, "required": ["name", "height_in_meters"], "title": "Person", "type": "object"}}, "description": "Identifying information about all people in a text.", "properties": {"people": {"items"

在定义了我们的 prompt 之后，我们只需将 prompt、model 和 output parser 链接在一起：

In [5]:
chain = prompt | model | parser
chain.invoke({"query": query})

People(people=[Person(name='Anna', height_in_meters=1.83)])

查看相关的 [Langsmith 跟踪](https://smith.langchain.com/public/92ed52a3-92b9-45af-a663-0a9c00e5e396/r)。

请注意，模式（schema）出现在两个地方：

1.  在提示（prompt）中，通过 `parser.get_format_instructions()`；
2.  在链（chain）中，用于接收格式化的输出并将其结构化为 Python 对象（在本例中为 Pydantic 对象 `People`）。

## 自定义解析器

如果需要，可以轻松地使用 `LangChain` 和 `LCEL` 创建自定义提示词和解析器。

要创建自定义解析器，请定义一个函数来解析模型（通常是 [AIMessage](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html)）的输出，并将其转换为您选择的对象。

以下是一个 JSON 解析器的简单实现示例。

In [6]:
import json
import re
from typing import List, Optional

from langchain_anthropic.chat_models import ChatAnthropic
from langchain_core.messages import AIMessage
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field, validator


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Output your answer as JSON that  "
            "matches the given schema: ```json\n{schema}\n```. "
            "Make sure to wrap the answer in ```json and ``` tags",
        ),
        ("human", "{query}"),
    ]
).partial(schema=People.schema())


# Custom parser
def extract_json(message: AIMessage) -> List[dict]:
    """Extracts JSON content from a string where JSON is embedded between ```json and ``` tags.

    Parameters:
        text (str): The text containing the JSON content.

    Returns:
        list: A list of extracted JSON strings.
    """
    text = message.content
    # Define the regular expression pattern to match JSON blocks
    pattern = r"```json(.*?)```"

    # Find all non-overlapping matches of the pattern in the string
    matches = re.findall(pattern, text, re.DOTALL)

    # Return the list of matched JSON strings, stripping any leading or trailing whitespace
    try:
        return [json.loads(match.strip()) for match in matches]
    except Exception:
        raise ValueError(f"Failed to parse: {message}")

In [7]:
query = "Anna is 23 years old and she is 6 feet tall"
print(prompt.format_prompt(query=query).to_string())

System: Answer the user query. Output your answer as JSON that  matches the given schema: ```json
{'$defs': {'Person': {'description': 'Information about a person.', 'properties': {'name': {'description': 'The name of the person', 'title': 'Name', 'type': 'string'}, 'height_in_meters': {'description': 'The height of the person expressed in meters.', 'title': 'Height In Meters', 'type': 'number'}}, 'required': ['name', 'height_in_meters'], 'title': 'Person', 'type': 'object'}}, 'description': 'Identifying information about all people in a text.', 'properties': {'people': {'items': {'$ref': '#/$defs/Person'}, 'title': 'People', 'type': 'array'}}, 'required': ['people'], 'title': 'People', 'type': 'object'}
```. Make sure to wrap the answer in ```json and ``` tags
Human: Anna is 23 years old and she is 6 feet tall


In [8]:
chain = prompt | model | extract_json
chain.invoke({"query": query})



[{'people': [{'name': 'Anna', 'height_in_meters': 1.83}]}]

## 其他库

如果你打算通过解析方法进行提取，可以了解一下 [Kor](https://eyurtsev.github.io/kor/) 库。它由一位 `LangChain` 的维护者编写，可以帮助你构建一个考虑了示例的提示词，允许控制格式（例如 JSON 或 CSV），并使用 TypeScript 来表达 schema。它看起来效果相当不错！