# Parsing Output

Two Key elements:
- `format_instructions`: An extra string that LangChain adds to the end of a prompt to assist formatting
- `parse()` method: A method for using eval() to internally to parse the string to the Python object.

In [1]:
import os
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    AIMessagePromptTemplate,
)

In [2]:
model = ChatOpenAI(openai_api_key=os.getenv("openai_api_key"))

## Steps

### Step 1: Import Parser

In [3]:
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()

### Step 2: Format Instructions
This is a set of instructions that will give this info to the Model


In [4]:
output_parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz`'

### Step 3: Parse

In [5]:
reply = "red, blue, green"  # simulate reply
print(output_parser.parse(reply))

['red', 'blue', 'green']


## Wrap it up

In [6]:
# good practice to place format instructions in new line
human_template = "{request}\n{format_instructions}"
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

In [7]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [8]:
model_request = chat_prompt.format_prompt(
    request="Give me 5 characteristics of dogs",
    format_instructions=output_parser.get_format_instructions(),
).to_messages()

In [9]:
result = model(model_request)

In [11]:
print(result.content)
print(type(result.content))

Loyal, playful, friendly, protective, and trainable.
<class 'str'>


In [12]:
parsed_content = output_parser.parse(result.content)
print(parsed_content)
print(type(parsed_content))

['Loyal', 'playful', 'friendly', 'protective', 'and trainable.']
<class 'list'>


The basics of parsing aren't always enough to have an output in the format we desire. 

When this happens, we have two fixes:
- A **strong System Prompt** with clear instructions
- An **OutputFixingParser** that uses a chain, re-send your original reply to an LLM to fix it.

In [13]:
from langchain.output_parsers import DatetimeOutputParser

In [14]:
output_parser = DatetimeOutputParser()

In [15]:
format_instructions = output_parser.get_format_instructions()

In [16]:
template_text = "{request}\n{format_instructions}"

human_prompt = HumanMessagePromptTemplate.from_template(template_text)

In [17]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [18]:
model_request = chat_prompt.format_prompt(
    request="What date was the 13th Amendment ratified in the US?",
    format_instructions=output_parser.get_format_instructions(),
).to_messages()

In [19]:
result = model(model_request, temperature=0)

In [20]:
print(result.content)

The 13th Amendment was ratified in the US on December 6, 1865.

The datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"


In this case, the reply was:

```
The 13th Amendment was ratified in the US on December 6, 1865.

The datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"
```
which does not comply with our parser. If we try to parse it, it will raise an error.

### Strong system prompt

In [21]:
human_template = "{request}\n{format_instructions}"
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

system_template = "You always reply to questions with only the datetime pattern."
system_prompt = HumanMessagePromptTemplate.from_template(system_template)

chat_prompt = ChatPromptTemplate.from_messages([system_prompt, human_prompt])

model_request = chat_prompt.format_prompt(
    request="What date was the 13th Amendment ratified in the US?",
    format_instructions=output_parser.get_format_instructions(),
).to_messages()

In [22]:
result = model(model_request, temperature=0)
print(result.content)

1865-12-06T00:00:00.000000Z


In [23]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

### OutputFixingParser

In [24]:
from langchain.output_parsers import OutputFixingParser

In [25]:
misformatted = """The 13th Amendment was ratified in the US on December 6, 1865.

The datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"""

In [26]:
new_parser = OutputFixingParser.from_llm(llm=model, parser=output_parser)

This may solve it, or it may raise another error. It depends on the LLM interpretation.

The best solution is to use both.

## Pydantic for custom classes

In [27]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

In [28]:
class Scientist(BaseModel):
    name: str = Field(description="Name of a Scientist")
    discoveries: list[str] = Field(description="Python list of discoveries")

In [29]:
parser = PydanticOutputParser(pydantic_object=Scientist)

In [30]:
format_instructions = parser.get_format_instructions()
print(format_instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Scientist", "title": "Name", "type": "string"}, "discoveries": {"description": "Python list of discoveries", "items": {"type": "string"}, "title": "Discoveries", "type": "array"}}, "required": ["name", "discoveries"]}
```


In [31]:
human_prompt = HumanMessagePromptTemplate.from_template(
    "{request}\{format_instructions}"
)

chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [32]:
model_request = chat_prompt.format_prompt(
    request="Tell me about a scientist and their most famous inventions",
    format_instructions=format_instructions,
).to_messages()

In [33]:
response = model(model_request, temperature=0)

In [34]:
print(response.content)

{
  "name": "Albert Einstein",
  "discoveries": [
    "Theory of Relativity",
    "Photoelectric Effect",
    "Brownian Motion"
  ]
}


In [35]:
parser.parse(response.content)

Scientist(name='Albert Einstein', discoveries=['Theory of Relativity', 'Photoelectric Effect', 'Brownian Motion'])