# 3. Sentiment Analysis

Structured Outputs i.e. Mapping the output from an LLM to a python class

## Setup

In [21]:
import os

try:
    # load environment variables from .env file (requires `python-dotenv`)
    from dotenv import load_dotenv

    load_dotenv()
except ImportError:
    pass

assert os.environ["LANGSMITH_TRACING"] is not None
assert os.environ["LANGSMITH_API_KEY"] is not None
assert os.environ["LANGSMITH_PROJECT"] is not None
assert os.environ["OPENAI_API_KEY"] is not None

In [22]:
from langchain.chat_models import init_chat_model
model = init_chat_model("gpt-4o-mini", model_provider="openai")

## 3.1 Tool-Calling

- Tool calling in LangChain enables an LLM to interact with external systems, such as making API calls, querying databases, or executing code.
- When invoking a tool, the LLM must generate inputs that conform to the tool’s expected input schema—for example, a structured JSON payload or SQL query format.
- Tool responses also follow a defined schema, allowing the LLM or orchestrator to interpret the results reliably.
- **Important**: Not all models support tool calling but the popular ones (Gemini, ChatGPT and Claude do)

![](./docs/llms_use_tools_to_interact_with_systems.png)
The LLM uses natural language to interact with humans and tools to ineract with systems

## 3.2 Structured Output

- In this tutorial, we will use tool-calling features of chat models to extract structured information from unstructured text

In [23]:
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

class Person(BaseModel):
    """Information about a person."""

    # ^ Doc-string for the entity Person.
    # This doc-string is sent to the LLM as the description of the schema Person,
    # and it can help to improve extraction results.

    # Note that:
    # 1. Each field is an `optional` -- this allows the model to decline to extract it!
    # 2. Each field has a `description` -- this description is used by the LLM.
    # Having a good description can help improve extraction results.
    name: Optional[str] = Field(default=None, description="The name of the person")
    hair_color: Optional[str] = Field(
        default=None, description="The color of the person's hair if known"
    )
    height_in_meters: Optional[str] = Field(
        default=None, description="Height measured in meters"
    )


# Define a custom prompt to provide instructions and any additional context.
# 1) You can add examples into the prompt template to improve extraction quality
# 2) Introduce additional parameters to take context into account (e.g., include metadata
#    about the document from which the text was extracted.)
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert extraction algorithm. "
            "Only extract relevant information from the text. "
            "If you do not know the value of an attribute asked to extract, "
            "return null for the attribute's value.",
        ),
        # Please see the how-to about improving performance with
        # reference examples.
        # MessagesPlaceholder('examples'),
        ("human", "{text}"),
    ]
)

structured_llm = model.with_structured_output(schema=Person)
text = "Alan Smith is 6 feet tall and has blond hair."
prompt = prompt_template.invoke({"text": text})
structured_llm.invoke(prompt)

Person(name='Alan Smith', hair_color='blond', height_in_meters='1.83')

**IMPORTANT PRACTICES**:
1. Document the attributes **AND** the schema itself: This information is sent to the LLM and is used to improve the quality of information extraction.
2. Do not force the LLM to make up information! Above we used Optional for the attributes allowing the LLM to output None if it doesn't know the answer.



### 3.2.1 Multiple Entities

- In most cases, you should be extracting a list of entities rather than a single entity.
- This can be easily achieved using pydantic by nesting models inside one another.

In [24]:
from typing import List, Optional
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""

    # ^ Doc-string for the entity Person.
    # This doc-string is sent to the LLM as the description of the schema Person,
    # and it can help to improve extraction results.

    # Note that:
    # 1. Each field is an `optional` -- this allows the model to decline to extract it!
    # 2. Each field has a `description` -- this description is used by the LLM.
    # Having a good description can help improve extraction results.
    name: Optional[str] = Field(default=None, description="The name of the person")
    hair_color: Optional[str] = Field(
        default=None, description="The color of the person's hair if known"
    )
    height_in_meters: Optional[str] = Field(
        default=None, description="Height measured in meters"
    )


class Data(BaseModel):
    """Extracted data about people."""

    # Creates a model so that we can extract multiple entities.
    people: List[Person]

structured_llm = model.with_structured_output(schema=Data)
text = "My name is Jeff, my hair is black and i am 6 feet tall. Anna has the same color hair as me."
prompt = prompt_template.invoke({"text": text})
structured_llm.invoke(prompt)

Data(people=[Person(name='Jeff', hair_color='black', height_in_meters='1.83'), Person(name='Anna', hair_color='black', height_in_meters=None)])

## 3.3 Tagging

Tagging means labeling a document with classes such as:

- Sentiment
- Language
- Style (formal, informal etc.)
- Covered topics
- Political tendency

Tagging has a few components:

- `function`: Like extraction, tagging uses functions to specify how the model should tag a document
- `schema`: defines how we want to tag the document


In [25]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)


class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text")
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10"
    )
    language: str = Field(description="The language the text is written in")


# Structured LLM
structured_llm = model.with_structured_output(Classification)

In [26]:
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

response

Classification(sentiment='positive', aggressiveness=1, language='Spanish')

### 3.3.1 Finer Control over the output

Careful schema definition gives us more control over the model's output.

Specifically, we can define:
- Possible values for each property
- Description to make sure that the model understands the property
- Required properties to be returned


In [27]:
class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
    aggressiveness: int = Field(
        ...,
        description="describes how aggressive the statement is, the higher the number the more aggressive",
        enum=[1, 2, 3, 4, 5],
    )
    language: str = Field(
        ..., enum=["spanish", "english", "french", "german", "italian"]
    )

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)

llm = ChatOpenAI(temperature=0, model="gpt-4o-mini").with_structured_output(
    Classification
)

inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
response = llm.invoke(prompt)

response.model_dump()

{'sentiment': 'happy', 'aggressiveness': 1, 'language': 'spanish'}

In [28]:
results = []
for inp in [
        "muy amigo esta muerto",
        "donde esta la bibliotecha",
        "merde!",
        "ich habe kein mehr Gelt. Oh well, i can always rob a bank","Will you marry me?",
        "I'd marry you over my dead body you ugly 4-eyed piece of human excrement",
        "All the other kids with the pumped up kicks better run, better run, outrun my gun",
        "Jojo mogo roko wowo mukaka bamo",
    "23423 324342 2342290 00"
]:
    prompt = tagging_prompt.invoke({"input": inp})
    response = llm.invoke(prompt)
    results.append(response.model_dump())

results

[{'sentiment': 'sad', 'aggressiveness': 1, 'language': 'spanish'},
 {'sentiment': 'neutral', 'aggressiveness': 1, 'language': 'spanish'},
 {'sentiment': 'sad', 'aggressiveness': 5, 'language': 'french'},
 {'sentiment': 'sad', 'aggressiveness': 3, 'language': 'german'},
 {'sentiment': 'happy', 'aggressiveness': 1, 'language': 'english'},
 {'sentiment': 'sad', 'aggressiveness': 5, 'language': 'english'},
 {'sentiment': 'neutral', 'aggressiveness': 3, 'language': 'english'},
 {'sentiment': 'neutral', 'aggressiveness': 1, 'language': 'english'},
 {'sentiment': 'neutral', 'aggressiveness': 1, 'language': 'english'}]

## 3.5 Playground

In [31]:
from pydantic import BaseModel, Field

class BudgetEntry(BaseModel):
    amount: Optional[float] = Field(description = "The income or expense amount",default=0.0)
    currency: Optional[str] = Field(description = "The currency of the amount",default='AED')
    creditOrDebit: Optional[str] = Field(description = "Credit or Debit. Debit if the amount was debited/spent. credit if the amount was received. Defaults to credit", enum=["C","D"],default='D')
    memo: Optional[str] = Field(description="Short description of the credit/debit event e.g. Shopping")
    category: str = Field(description="The category of the credit/debit event e.g. Bills", enum=["Salary","Bills","Rent","Shopping","Car","Home"])

structured_llm = model.with_structured_output(BudgetEntry)
system_template = """
Extract the properties of the 'BudgetEntry' function from the following input. 
"""
user_input = "20 aed on a haircut"

prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", "{user_input}")])
prompt = prompt_template.invoke({"user_input": user_input})
response = structured_llm.invoke(prompt)

In [32]:
response.model_dump()

{'amount': 20.0,
 'currency': 'AED',
 'creditOrDebit': 'D',
 'memo': 'Haircut',
 'category': 'Shopping'}

In [None]:
class BudgetEntries(BaseModel):
    entries: list[BudgetEntry]

structured_llm = model.with_structured_output(BudgetEntries)
user_input = "20 aed on a haircut"
prompt = prompt_template.invoke({"user_input": user_input})
response = structured_llm.invoke(prompt)