
# **Output Parsers**

Output parsers are tools or methods used to extract structured data or information from unstructured or semi-structured textual outputs generated by language models or other natural language processing (NLP) systems. They are essential in converting raw text responses into structured formats that are easier to analyze, manipulate, and integrate into applications or workflows.


In [None]:
%%capture
# update or install the necessary libraries
!pip install --upgrade langchain langchain_community langchain-openai
!pip install --upgrade python-dotenv

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

os.environ["OPENAI_API_VERSION"] = os.getenv('OPENAI_API_VERSION')
os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv('AZURE_OPENAI_ENDPOINT')
os.environ["AZURE_OPENAI_API_KEY"] = os.getenv('AZURE_OPENAI_API_KEY')

In [5]:
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
    deployment_name="gpt-4o",
    temperature = 0.5
)

Sample customer review text with information to be extracted

Template string defining the prompt structure and variables to extract



In [6]:
customer_review = """\
While my experience at XYZ Corp has been largely positive, \
Our company offers a flexible work arrangement, including options for remote work. \
Employees are entitled to a minimum of three weeks of paid vacation per year.
Employee performance evaluations are conducted annually to assess job performance and set goals for improvement.\
"""

review_template = """\
For the following text, extract the following information:

Work_Life_Balance: What work arrangement options are available to employees? \
Answer True if yes, False if not or unknown.

Paid_Leaves: How much paid vacation time are employees entitled to annually? \
If this information is not found, output -1.

performance_evaluation: How often are employee performance evaluations conducted?,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
Work_Life_Balance
Paid_Leaves(Weeks)
performance_evaluation

text: {text}
"""

In [8]:
# setting the review template for prompt
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)

In [9]:
# passing text as part of the prompt
messages = prompt_template.format_messages(text=customer_review)

In [10]:
response = llm(messages)

  response = llm(messages)


In [11]:
response.content

'```json\n{\n  "Work_Life_Balance": true,\n  "Paid_Leaves(Weeks)": 3,\n  "performance_evaluation": ["annually"]\n}\n```'

In [15]:
# You will get an error by running this line of code
# because'Paid_Leaves' is not a dictionary
# 'Paid_Leaves' is a string
response.content.get('Paid_Leaves(Weeks)')

AttributeError: 'str' object has no attribute 'get'

Define response schemas for structured output parsing


In [16]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

In [17]:
#Define the ResponseSchema
Work_Life_Balance_schema = ResponseSchema(name="Work_Life_Balance",
                             description="What work arrangement options are available to employees? \
                             Answer True if yes, False if not or unknown.")
Paid_Leaves_schema = ResponseSchema(name="Paid_Leaves(Weeks)",
                                      description="How much paid vacation time are employees entitled to annually? \
                                      If this information is not found, output -1.")
performance_evaluation_schema = ResponseSchema(name="performance_evaluation",
                                    description="How often are employee performance evaluations conducted?,\
                                    and output them as a comma separated Python list.")

response_schemas = [Work_Life_Balance_schema, Paid_Leaves_schema,performance_evaluation_schema]

In [18]:
# configured the structured output parser
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [19]:
format_instructions = output_parser.get_format_instructions()

In [20]:
format_instructions

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"Work_Life_Balance": string  // What work arrangement options are available to employees?                              Answer True if yes, False if not or unknown.\n\t"Paid_Leaves(Weeks)": string  // How much paid vacation time are employees entitled to annually?                                       If this information is not found, output -1.\n\t"performance_evaluation": string  // How often are employee performance evaluations conducted?,                                    and output them as a comma separated Python list.\n}\n```'

In [21]:
# configuring output format
prompt = ChatPromptTemplate.from_template(template=review_template)
messages = prompt.format_messages(text=customer_review, format_instructions=format_instructions)

In [22]:
response = llm(messages)

In [23]:
print(response.content)

```json
{
  "Work_Life_Balance": true,
  "Paid_Leaves(Weeks)": 3,
  "performance_evaluation": ["annually"]
}
```


In [24]:
output_dict = output_parser.parse(response.content)

In [25]:
type(output_dict)

dict

In [26]:
print(output_dict.get('Paid_Leaves(Weeks)'))

3


# **Let's Do an Activity**

## **Objective**

Practice using a structured output parser to extract specific information from text using a language model.

## **Scenario**

You are working on a project where you need to analyze customer feedback to extract key details such as sentiment, product mentions, and issues reported. You'll utilize a language model to process the feedback and a structured output parser to extract structured information.

## **Steps**

* Define a Prompt Template
* Prepare Sample Feedback
* Create Response Schema
* Format Instructions and Prompt Template
* Interact with the Model
* Parse and Display Results