通常而言，大语言模型的输出是一个字符串。不过有时我们希望生成的文本带有一些结构，而 Output parser 可以帮助结构化大语言模型的输出。LangChain 的 output parser 有几十种，分步在 `langchain_core.output_parsers`，`langchain.output_parsers` 和 `langchain_community.output-parser` 中。限于篇幅不可能全部介绍，仅介绍 `PydanticOutputParser` 和 `OutputFixingParser`。
# 类型提示和类型约束
Python 作为一个解释性语言，本身是没有类型约束的。举个例子，一个变量 `var` 既可以是整数 `1` 也可以是字符串 `'1'`，而且可以随时改变。再比如一个函数
```python
def fun(a, b):
    return a + b
```
里，`a` 和 `b` 可以是任何类型，因为 `fun` 函数并不做类型约束。虽然假如 `a` 和 `b` 都是整数或字符串，这个函数可以正常运行，但是作者希望这个函数做什么呢？使用者不知道。但是假如这个函数改成了
```python
def fun(a: int, b: int) -> int:
    return a + b
```
现在使用者就知道了，哦，原来作者只希望这个函数接受整数并返回一个整数。这个 `a: int` 叫做类型提示。然而这个提示没有约束作用，它只是个提示，使用者还是可以传入字符串。这时候就需要类型约束了，而 `pydantic` 包就可以做到这一点，因为如果程序中的数据类型不符合定义，那么 `pydantic` 就会抛出异常。使用前需要安装 `pydantic` 包：`pip install pydantic`。
```python
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

user1 = User(name="Alice", age=25) # 正常
user2 = User(name="Bob", age="30") # 错误，age 是字符串
```
当然这只是 `pydantic` 的一个入门应用，有兴趣想了解更多的话建议阅读官方文档。
# `PydanticOutputParser`
`PydanticOutputParser` 是一个利用 `pydantic` 库进行类型约束的输出解析器。
第一步定义解析器中的数据类型，初始化解析器。

In [1]:
from langchain_ollama.llms import OllamaLLM
llm = OllamaLLM(model="qwen2.5:0.5b")

from pydantic import BaseModel, Field

class Flower(BaseModel):
    flower_type: str = Field(..., description='花名')
    price: int = Field(..., description='价格')
    description: str = Field(..., description='花语')
    reason: str = Field(..., description='推荐理由')
    
from langchain.output_parsers import PydanticOutputParser
output_parser = PydanticOutputParser(pydantic_object=Flower)

format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"flower_type": {"description": "花名", "title": "Flower Type", "type": "string"}, "price": {"description": "价格", "title": "Price", "type": "integer"}, "description": {"description": "花语", "title": "Description", "type": "string"}, "reason": {"description": "推荐理由", "title": "Reason", "type": "string"}}, "required": ["flower_type", "price", "description", "reason"]}
```


第二步定义提示词模板，传入刚才定义的约束。

In [2]:
from langchain import PromptTemplate
template = """您是一位资深编辑。
对于售价为 {price} 元的 {flower} ，您能提供一个吸引人的简短中文描述吗？
{format_instructions}"""

prompt_template = PromptTemplate.from_template(template=template,
                                      partial_variables={'format_instructions':format_instructions})
print(prompt_template)

input_variables=['flower', 'price'] input_types={} partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"flower_type": {"description": "花名", "title": "Flower Type", "type": "string"}, "price": {"description": "价格", "title": "Price", "type": "integer"}, "description": {"description": "花语", "title": "Description", "type": "string"}, "reason": {"description": "推荐理由", "title": "Reason", "type": "string"}}, "required": ["flower_type", "price", "description", "reason"]}\n```'} template='您是一位资深编辑。\n对于售价为 {price} 元的 {flower} ，您能提供一个吸引人的简短

第三步由大语言模型生成输出，解析器解析输出。

In [3]:
prompt = prompt_template.format(flower='红玫瑰', price=30)
print(prompt, '\n------')

output = llm.invoke(prompt) # 大语言模型生成输出
print(output, '\n------')

parsed_output = output_parser.parse(output)
print(parsed_output.dict(), '\n------')

您是一位资深编辑。
对于售价为 30 元的 红玫瑰 ，您能提供一个吸引人的简短中文描述吗？
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"flower_type": {"description": "花名", "title": "Flower Type", "type": "string"}, "price": {"description": "价格", "title": "Price", "type": "integer"}, "description": {"description": "花语", "title": "Description", "type": "string"}, "reason": {"description": "推荐理由", "title": "Reason", "type": "string"}}, "required": ["flower_type", "price", "description", "reason"]}
``` 
------
```json
{
  "flower_type": "红玫瑰",
  "price": 30,
  "description": "寓意着爱情、美丽与希望，让人感受到甜蜜和幸福。",
  "reason": "它不仅是

# `OutputFixingParser`
`pydantic` 的约束在有的时候可能过于严格，而且大语言模型的输出由于大语言模型的随机性，可能无法满足 `pydantic` 的要求。这时需要一个修复机制，而 `OutputFixingParser` 可以做到这一点。

In [4]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

class nums(BaseModel):
    n: int = Field(description="count")
    numbers: List[int] = Field(description="list of numbers")
    
parser = PydanticOutputParser(pydantic_object=nums)

# 定义一个格式不正确的输出
misformatted = "{'n': 5, 'numbers': ['1', '2', '3', '4', '5']}"
parsed_output = parser.parse(misformatted) # 报错

OutputParserException: Invalid json output: {'n': 5, 'numbers': ['1', '2', '3', '4', '5']}
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

`OutputFixingParser` 需要一个大语言模型来帮助处理输出的格式不匹配。

In [5]:
from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)

result = new_parser.parse(misformatted)
print(result)

n=5 numbers=[1, 2, 3, 4, 5]
