# Structured Output Parser

- Author: [Yoolim Han](https://github.com/hohosznta)
- Design: []()
- Peer Review: []()
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)

## Overview

The `StructuredOutputParser` is a valuable tool for formatting Large Language Model (LLM) responses into dictionary structures, enabling the return of multiple fields as key/value pairs. 
hile Pydantic and JSON parsers offer robust capabilities, the `StructuredOutputParser `is particularly effective for less powerful models, such as local models with fewer parameters. It is especially beneficial for models with lower intelligence compared to advanced models like GPT or Claude. 
By utilizing the `StructuredOutputParser`, developers can maintain data integrity and consistency across various LLM applications, even when operating with models that have reduced parameter counts.

### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Implementing the Structured Output Parser](#implementing-the-structured-output-parser)

### References

- [LangChain ChatOpenAI API reference](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
- [LangChain Structured output parser](https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/types/structured/)
---

## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. 
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [146]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [147]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_community",
    ],
    verbose=False,
    upgrade=False,
)


In [148]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "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",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "03-StructuredOutputParser",
    }
)


Environment variables have been set successfully.


You can alternatively set `OPENAI_API_KEY` in `.env` file and load it. 

[Note] This is not necessary if you've already set `OPENAI_API_KEY` in previous steps.

In [149]:
from dotenv import load_dotenv

load_dotenv(override=True)


False

## Implementing Structured Output Parser

### Using ResponseSchema with StructuredOutputParser
*   Define a response schema using the ResponseSchema class to include the answer to the user's question and a description of the source (website) used.

*   Initialize `StructuredOutputParser` with response_schemas to structure the output according to the defined response schema.

**[Note]**
When using local models, Pydantic parsers may frequently fail to work properly. In such cases, using `StructuredOutputParser` can be a good alternative solution.

In [150]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

# Response to the user's question
response_schemas = [
    ResponseSchema(name="answer", description="Answer to the user's question"),
    ResponseSchema(
        name="source",
        description="The `source` used to answer the user's question, which should be a `website URL`.",
    ),
]
# Initialize the structured output parser based on the response schemas
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)


### Embedding Response Schemas into Prompts 

Create a PromptTemplate to format user questions and embed parsing instructions for structured outputs.

In [151]:
from langchain_core.prompts import PromptTemplate
# Parse the format instructions.
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    # Set up the template to answer the user's question as best as possible.
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    # Use 'question' as the input variable.
    input_variables=["question"],
    # Use 'format_instructions' as a partial variable.
    partial_variables={"format_instructions": format_instructions},
)

### Integrating with ChatOpenAI and Running the Chain

Combine the `PromptTemplate`, `ChatOpenAI` model, and `StructuredOutputParser` into a chain. Finally, run the chain with a specific `question` to produce results.

In [152]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)  # Initialize the ChatOpenAI model

chain = prompt | model | output_parser  # Connect the prompt, model, and output parser

# Ask the question, "What is the capital of South Korea?"
chain.invoke({"question": "What is the capital of South Korea?"})

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-********************************************************************************************************************************************************4akA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

### Using Streamed Outputs

Use the `chain.stream` method to receive a streaming response to the question, "What are the achievements of King Sejong?"

In [None]:
for s in chain.stream({"question": "What are the achievements of King Sejong?"}):
    # Stream the output
    print(s)

{'answer': 'King Sejong of the Joseon Dynasty is best known for his creation of the Korean alphabet, Hangul, to promote literacy among the common people. He also made advancements in science, technology, and culture during his reign.', 'source': 'https://en.wikipedia.org/wiki/Sejong_the_Great'}
