# Output Parsers

Output Parsers convert the model’s raw text or message into a clean, structured Python value (string, list, dict/JSON, Pydantic object, etc.). They make your chains reliable, testable, and easy to compose.

I think of it as taking a language like English and producing structured outputs with it. This is standardizing outputs using an input language as verbose and unstructured as English.

What we'll cover:
- StrOutputParser — plain text
- CommaSeparatedListOutputParser — lists from simple text
- JsonOutputParser — validated JSON objects
- PydanticOutputParser — strongly typed objects
- Fixing bad outputs with an “auto-repair” parser

## Bootstrap

⚓--- Before proceeding futher it is very important you do the following: --- 👾

Select the 🗝 (key) icon in the left pane and include your OpenAI Api key with Name as "OPENAPI_KEY" and value as the key, and grant it notebook access in order to be able to run this notebook.

Run the below two cells in the order they are in, before running further cells. Wait till a number appears in place of '*' or '[ ]'. Below the cell you should see "Ready. LangChain + OpenAI set up."

In [None]:
!pip install -q langchain langchain-openai langchain-community pydantic pypdf faiss-cpu

In [None]:
# Environment & imports
import json
from google.colab import userdata

key = userdata.get('OPENAI_API_KEY')  # returns None if not granted
if not key:
    raise RuntimeError("Set OPENAI_API_KEY in a .env file next to this notebook.")

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser, CommaSeparatedListOutputParser
from langchain_core.runnables import RunnableLambda

# (For Pydantic-based parsing)
from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser  # keep this import; it’s the common one

# Model (small & fast). We’ll enable streaming only when useful.
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2, api_key=key)
print("✅ Ready: Output Parsers tutorial")

## StrOutputParser - The simplest one

A string parser is often enough when you just need clean text without role metadata.

In [None]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer concisely in one sentence."),
    ("user", "Summarize: {text}")
])

chain = prompt | llm | StrOutputParser()

text = "LangChain lets you compose LLM apps using prompts, models, tools, and chains."
result = chain.invoke({"text": text})

print("\n--- Final Output (str) ---")
print(result)

## CommaSeparatedListOutputParser

Returns a list of comma separated values. This can be helpful when you want structured data not as JSON but as something as simple as a CSV. When using OutputParsers other than StrOutputParser it is important to mention the output type or structure in the prompt.

You can check what happens as you remove the phrase "comma-separated only" in the prompt.

In [None]:
list_parser = CommaSeparatedListOutputParser()

kw_prompt = PromptTemplate.from_template(
    "From the following text, extract exactly 5 short keywords, comma-separated only.\n\nText: {text}\n"
)

kw_chain = kw_prompt | llm | list_parser

keywords = kw_chain.invoke({
    "text": "LangChain enables composition of LLM apps; LCEL provides piping; tools and retrievers add capabilities."
})

print("\n--- Final Output (list) ---")
print(keywords)      # a Python list of 5 strings
print("count:", len(keywords))

## JsonOutputParser

Well, use this when you need a JSON or a dictionary with key-value pairs

In [None]:
json_parser = JsonOutputParser()

# 1) Define the schema and ask the model to return the exact JSON structure
schema_instructions = """Return a strict JSON object with keys:
- "name": string
- "category": string
- "price": number
- "features": array of strings (3 items)
No extra commentary or fields. Only the JSON.
"""

product_prompt = ChatPromptTemplate.from_messages([
    ("system", "You produce strictly formatted JSON responses."), # Instruct llm to return JSON response.
    ("user", "{schema}\n\nMake a synthetic product card for: {topic}")
])

product_chain = (
    {"schema": RunnableLambda(lambda x: schema_instructions), "topic": RunnableLambda(lambda x: x["topic"])}
    | product_prompt
    | llm
    | json_parser
)

product = product_chain.invoke({"topic": "a lightweight running shoe"})
print("\n--- Final Output (dict) ---")
print(json.dumps(product, indent=2))
print("type:", type(product))

## Pydantic Objects and PydanticOutputParser

Define a Pydantic model and parse the model output directly into a validated Python object. Pydantic models are similar to POJO (Plain Old Javascript Objects) or Typescript or Javascript objects with type hinting.

We create a `PydanticOutputParser` with the `pydantic_object` as the object on which the output should be modelled after.

`get_format_instructions()` on the resultant object returns formatting instructions for JSON type with the JSON schema modelled after the Pydantic model.

In the result, we can use `.model_dump` to convert the Pydantic object into JSON output.

In [None]:
# 1) Define your target schema
class Ticket(BaseModel):
    title: str = Field(..., description="Short title of the user's issue")
    urgency: str = Field(..., description="One of: low, medium, high")
    tags: list[str] = Field(..., description="Relevant tags (2-5)")

ticket_parser = PydanticOutputParser(pydantic_object=Ticket)

# 2) Provide parser instructions to the model (helps it format correctly)
format_instructions = ticket_parser.get_format_instructions()

ticket_prompt = PromptTemplate.from_template(
    "Read the user's request and produce a ticket.\n"
    "{format_instructions}\n\n"
    "User request: {request}"
)

# 3) Prompt → Model → Pydantic object
ticket_chain = (
    {"format_instructions": RunnableLambda(lambda _: format_instructions),
     "request": RunnableLambda(lambda x: x["request"])}
    | ticket_prompt
    | llm
    | ticket_parser
)

ticket = ticket_chain.invoke({"request": "The dashboard times out when I export to CSV. Need a fix ASAP."})
print("\n--- Final Output (Ticket model) ---")
print(ticket)
print("type:", type(ticket))
print("ticket.urgency:", ticket.urgency)

Here, use `.model_dump` on ticket to convert the Pydantic output to JSON.

## Repairing invalid outputs (auto-fix pass)

Models sometimes return malformed JSON or miss fields. A common pattern is to repair the output by prompting the model again with the parsing error and asking it to fix the format.

In [None]:
from langchain.output_parsers import OutputFixingParser  # helper that uses the model to repair

# Base parser we want to enforce
base_parser = JsonOutputParser()

# Wrapper that will try to fix invalid outputs using the same llm
fixing_parser = OutputFixingParser.from_llm(parser=base_parser, llm=llm)

bad_json_prompt = PromptTemplate.from_template(
    "Return a JSON object with keys 'a' (int) and 'b' (int) only, no comments.\n"
    "If you include any text other than JSON, you'll be asked to fix it.\n"
    "Task: {task}"
)

strict_json_chain = bad_json_prompt | llm | fixing_parser

# Even if the model returns extra words, the fixing parser tries to repair it.
data = strict_json_chain.invoke({"task": "produce small integers for a and b"})
print("\n--- Final Output (repaired dict) ---")
print(data, type(data))

## How it works

1. OutputFixingParser wraps a base parser (like JsonOutputParser).
2. If parsing fails, it sends the error + original text back to the model to produce a corrected version.
3. You get a best-effort repaired result (still validate downstream if critical).

## Here's a complete chain with structured output

In [None]:
meetups_schema = """
Return a JSON object:
{
  "city": string,
  "events": [
    {"title": string, "date": string (ISO 8601), "topic": string}
  ]
}
Only the JSON. No commentary.
"""

meetups_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a structured information extractor."),
    ("user", "{schema}\n\nExtract from:\n\n{text}")
])

meetups_chain = (
    {"schema": RunnableLambda(lambda x: meetups_schema), "text": RunnableLambda(lambda x: x["text"])}
    | meetups_prompt
    | llm
    | JsonOutputParser()
)

sample = """
Bangalore Python Guild meets on 2025-09-12 to discuss LangGraph patterns.
DataTalks meetup on 2025-10-03: topic is vector databases for production RAG.
"""
parsed = meetups_chain.invoke({"text": sample})
print("\n--- Final Output (dict) ---")
print(json.dumps(parsed, indent=2))
print("events count:", len(parsed["events"]))