# Structured Output

## Why is good that our LLM Apps and Agents can return their responses in structured output?
* Instead of returning responses in natural language, sometimes we will want our LLM Apps to return their responses in structured output likeJSON objects, Pydantic models, or dataclasses.

## How can we make it happen in LangChain 1.0?
* LangChain‚Äôs `create_agent` handles structured output automatically.
    * The user sets their desired structured output schema using the `response_format` parameter,
    * The schema defining the structured output format supports:
        * Pydantic models: BaseModel subclasses with field validation
        * Dataclasses: Python dataclasses with type annotations
        * TypedDict: Typed dictionary classes
        * JSON Schema: Dictionary with JSON schema specification
    * and when the model generates the structured data, it‚Äôs captured, validated, and returned in the 'structured_response' key of the agent‚Äôs state.


```python
from pydantic import BaseModel, Field
from langchain.agents import create_agent


class ContactInfo(BaseModel):
    """Contact information for a person."""
    name: str = Field(description="The name of the person")
    email: str = Field(description="The email address of the person")
    phone: str = Field(description="The phone number of the person")

agent = create_agent(
    model="gpt-4o-mini",
    response_format=ContactInfo
)

result = agent.invoke({
    "messages": [{
        "role": "user", 
        "content": "Extract contact info from: John Doe, john@example.com, (555) 123-4567"}]
})

print(result["structured_response"])
# ContactInfo(name='John Doe', email='john@example.com', phone='(555) 123-4567')
```

* LangChain automatically uses ProviderStrategy when you pass a schema and the Model supports native structured output, with is the most frequent case. See the [documentation](https://docs.langchain.com/oss/python/langchain/structured-output) to see other less frequent options.

## Basic Example

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [5]:
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from pydantic import BaseModel

class ArticleFormat(BaseModel):
    title: str
    subtitle: str
    body: str

agent = create_agent(
    model='gpt-4o-mini',
    system_prompt="You are an investigative journalist.",
    response_format=ArticleFormat
)

question = HumanMessage(content="Write a short article explaining briefly the top conspiracy theories about who killed JFK?")

response = agent.invoke(
    {"messages": [question]}
)

article = response["structured_response"]
article_title = article.title
article_subtitle = article.subtitle
article_body = article.body

print(f"The journalist wrote an article called {article_title}")
print(f"The article was about {article_subtitle}")
print(f"This is the body of the article:\n {article_body}")

The journalist writed an article called Untangling Dallas: A Short Guide to the JFK Assassination Theories
The article was about Lone gunman, conspiracies, and a decades‚Äëlong puzzle
This is the body of the article:
 On November 22, 1963, President John F. Kennedy was assassinated in Dallas, Texas. The event sparked a torrent of theories that have persisted for generations. Here are the top ideas that recur in textbooks, documentaries, and conversations, with a note on what the evidence says today.

1) The official story: Lee Harvey Oswald acted alone. The Warren Commission (1964) concluded that Oswald fired three shots from the sixth floor of the Texas School Book Depository, striking Kennedy and Governor Connally. The Commission found no provable conspiracy and handed a simple explanation to a shaken nation. Critics have challenged some ballistic and investigative details, but this remains the governing account in most historical surveys.

2) A conspiracy involving multiple shooters

## Let's explain the previous code in simple terms

Below is a **simple, beginner-friendly, line-by-line explanation** of what this LangChain 1.0 code does. We will explain every part in plain language.

---

#### 1. Importing the tools we need

```python
from langchain.agents import create_agent
```

* This imports a helper function called `create_agent`.
* An **agent** is an AI helper that can receive instructions and generate responses.
* Think of it as ‚Äúcreating an AI worker‚Äù.

---

```python
from langchain.messages import HumanMessage
```

* This imports `HumanMessage`, which represents **something a human says to the AI**.
* LangChain uses message objects instead of raw strings to keep conversations structured.

---

```python
from pydantic import BaseModel
```

* This imports `BaseModel` from **Pydantic**.
* Pydantic is used to define **structured data** (data with fixed fields).
* We‚Äôll use it to tell the AI **exactly what format its answer must have**.

---

#### 2. Defining the response structure

```python
class ArticleFormat(BaseModel):
```

* This defines a **data model** called `ArticleFormat`.
* It describes what a valid AI response should look like.

---

```python
    title: str
    subtitle: str
    body: str
```

* These are the required fields in the response:

  * `title`: a string
  * `subtitle`: a string
  * `body`: a string
* The AI **must** return all three, or the response will fail validation.

üëâ In simple terms:

> ‚ÄúThe AI must answer in the form of an article with a title, subtitle, and body.‚Äù

---

#### 3. Creating the AI agent

```python
agent = create_agent(
```

* This creates the AI agent instance.
* From now on, `agent` is your AI journalist.

---

```python
    model='gpt-4o-mini',
```

* This tells LangChain which AI model to use.
* `gpt-4o-mini` is a lightweight, fast language model.

---

```python
    system_prompt="You are an investigative journalist.",
```

* This is a **system instruction**.
* It sets the AI‚Äôs role and behavior.
* The AI will try to answer **like a journalist**, not a chatbot.

---

```python
    response_format=ArticleFormat
)
```

* This forces the AI‚Äôs response to match the `ArticleFormat` schema.
* LangChain will automatically parse the output into structured data.

üëâ This is powerful because:

* No guessing
* No text parsing
* No messy JSON handling

---

#### 4. Creating the user question

```python
question = HumanMessage(
    content="Write a short article explaining briefly the top conspiracy theories about who killed JFK?"
)
```

* This creates a **human message**.
* `content` is what the user is asking the AI.
* Wrapping it in `HumanMessage` makes it compatible with LangChain‚Äôs message system.

---

#### 5. Sending the message to the agent

```python
response = agent.invoke(
    {"messages": [question]}
)
```

* This sends the message to the AI agent.
* `invoke()` runs the agent and waits for the response.
* The input is a dictionary containing a list of messages.

üëâ Even for one question, LangChain expects a **list of messages**.

---

#### 6. Extracting the structured response

```python
article = response["structured_response"]
```

* The agent‚Äôs output contains a parsed response.
* Because we used `response_format=ArticleFormat`, LangChain already converted the AI output into an `ArticleFormat` object.

üëâ At this point:

* `article` is **not text**
* It‚Äôs a Python object with fields: `title`, `subtitle`, and `body`

---

#### 7. Accessing each article field

```python
article_title = article.title
article_subtitle = article.subtitle
article_body = article.body
```

* These lines extract each part of the article.
* This works just like accessing attributes on any Python object.

---

#### 8. Printing the results

```python
print(f"The journalist wrote an article called {article_title}")
```

* Prints the article title.

---

```python
print(f"The article was about {article_subtitle}")
```

* Prints the subtitle (what the article is about).

---

```python
print(f"This is the body of the article:\n {article_body}")
```

* Prints the article body.
* `\n` adds a new line before the text.

---

#### üß† Big picture summary

This code:

1. Defines a **structured article format**
2. Creates an **AI journalist**
3. Asks it a question
4. Forces the AI to reply in a **clean, predictable structure**
5. Extracts and prints each part of the article

---

#### üöÄ Why this approach is powerful

* No messy string parsing
* Strong typing and validation

## Main options to structure the output in LangChain
* Pydantic and Dataclass are the most frequent ways to set the structure of the output in LangChain. See more detailed information and examples about these and other options in this [LangChain Documentation Page](https://docs.langchain.com/oss/python/langchain/structured-output).

## How to run this code from Visual Studio Code
* Open Terminal.
* Make sure you are in the project folder.
* Make sure you have the poetry env activated.
* Enter and run the following command:
    * `python 005-structured-output.py` 