## Output Parser

**产生的原因: 大模型返回的是str, 而我们希望是结构化的**

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

* **Get format instructions**: A method which returns a string containing instructions for <br>
  how the output of a language model should be formatted.<br>
* **Parse**: A method which takes in a string (assumed to be the response from a language model)<br>
   and parses it into some structure.
   
* **Parse with prompt** (**Optional**) A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to be the prompt that generated such a response) and parses it into some structure. 

In [1]:
from typing import List

In [2]:
from langchain.llms import OpenAI
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.pydantic_v1 import BaseModel, Field, validator

### 要素1: OutputParser

In [5]:
# Define your Desired data structure
class Joke(BaseModel):
    
    setup: str = Field(description='question to set up a joke')
    punchline : str = Field(description='answer to resolve the joke')
    
    # You can add custom validation logic easily with Pydantic
    @validator('setup')
    def question_ends_with_question_mask(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field
    
# Set up a parser + inject instructions into the prompt template
parser = PydanticOutputParser(pydantic_object=Joke)

### 要素2 Prompt

> You can also just initialize the prompt with the partialed variables 实例化带partial_variables的提示模版

> **你也可以在实例化的时候, 部分 格式化Prompt**

[Partial Prompt](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/partial)

In [7]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

### 要素3: Model

In [4]:
model = OpenAI(model_name='text-davinci-003', temperature=0.0)

### 组合执行

In [12]:
chain = prompt | model | parser

In [13]:
chain.invoke({"query": "Tell me a joke."})

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

### 分步执行

**"Get format instructions": A method which returns a string containing instructions for how the output of a language model should be formatted.**

In [7]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}
```


In [8]:
# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model

In [9]:
output = prompt_and_model.invoke({"query": "Tell me a joke."})

In [10]:
parser.invoke(output)

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

**"Parse": A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.**

In [11]:
parser.invoke(output)

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

## Streaming

In [14]:
from langchain.output_parsers.json import SimpleJsonOutputParser

In [15]:
json_prompt = PromptTemplate.from_template("Return a JSON object with an `answer` key that answers the following question: {question}")

In [16]:
json_parser = SimpleJsonOutputParser()

In [17]:
json_chain = json_prompt | model | json_parser

In [18]:
list(json_chain.stream({"question": "Who invented the microscope?"}))

[{},
 {'answer': ''},
 {'answer': 'Ant'},
 {'answer': 'Anton'},
 {'answer': 'Antonie'},
 {'answer': 'Antonie van'},
 {'answer': 'Antonie van Lee'},
 {'answer': 'Antonie van Leeu'},
 {'answer': 'Antonie van Leeuwen'},
 {'answer': 'Antonie van Leeuwenho'},
 {'answer': 'Antonie van Leeuwenhoek'}]

In [19]:
list(chain.stream({"query": "Tell me a joke."}))

[Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')]