# Classify Text into Labels

https://python.langchain.com/docs/tutorials/classification/

Tagging means labeling a document with classes such as:

- Sentiment
- Language
- Style (formal, informal etc.)
- Covered topics
- Political tendency

## Overview

Tagging has a few components:

- function: Like extraction, tagging uses functions to specify how the model should tag a document
- schema: defines how we want to tag the document


## Quickstart

Let's see a very straightforward example of how we can use OpenAI tool calling for tagging in LangChain.   
We'll use the with_structured_output method supported by OpenAI models.



In [None]:
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")


os.environ["LANGSMITH_TRACING"] = "true"
if not os.environ.get("LANGSMITH_API_KEY"):
  os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter API key for LANGSMITH: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

Let's specify a Pydantic model with a few properties and their expected type in our schema.

In [11]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

# https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html
tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)


class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text") # 情绪
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10" # 攻击性
    )
    language: str = Field(description="The language the text is written in") # 语言


# LLM
# https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html
llm = ChatOpenAI(temperature=0, model="gpt-4o-mini").with_structured_output(
    Classification
)

inp = "我非常高兴认识你！我想我们会成为非常好的朋友！"
prompt = tagging_prompt.invoke({"input": inp})
response = llm.invoke(prompt)

response
# Classification(sentiment='positive', aggressiveness=1, language='Chinese')

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-********************************************************************************************************************************************************7MoA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

In [None]:
response.model_dump()
# {'sentiment': 'positive', 'aggressiveness': 1, 'language': 'Chinese'}

{'sentiment': 'positive', 'aggressiveness': 1, 'language': 'Chinese'}

As we can see in the examples, it correctly interprets what we want.  

The results vary so that we may get, for example, sentiments in different languages ('positive', 'enojado' etc.).  

We will see how to control these results in the next section.  

## Finer control

Careful schema definition gives us more control over the model's output.

Specifically, we can define:

- Possible values for each property
- Description to make sure that the model understands the property
- Required properties to be returned

Let's redeclare our Pydantic model to control for each of the previously mentioned aspects using enums:

In [8]:
# ... 表示该字段是必填项（没有默认值）
# enum=[] 限定了该字段只能取数组里面的枚举值
class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"]) # 情绪
    aggressiveness: int = Field(
        ...,
        description="describes how aggressive the statement is, the higher the number the more aggressive", # 攻击性
        enum=[1, 2, 3, 4, 5],
    )
    language: str = Field(
        ..., enum=["spanish", "english", "french", "german", "italian", "chinese"] # 语言
    )

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)

llm = ChatOpenAI(temperature=0, model="gpt-4o-mini").with_structured_output(
    Classification
)


In [None]:
inp = "我非常高兴认识你！我想我们会成为非常好的朋友！"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

# Classification(sentiment='happy', aggressiveness=1, language='chinese')

Classification(sentiment='happy', aggressiveness=1, language='chinese')

In [None]:
inp = "我对你很生气！我会给你你应得的！"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)
# Classification(sentiment='sad', aggressiveness=5, language='chinese')

Classification(sentiment='sad', aggressiveness=5, language='chinese')

In [None]:
inp = "Weather is ok here, I can go outside without much more than a coat"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)
# Classification(sentiment='happy', aggressiveness=1, language='english')

Classification(sentiment='happy', aggressiveness=1, language='english')