### Outputparser
- Output Parsers in LangChain are responsible for taking the raw output of an LLM and transforming it into a more suitable format.

- Here is a breakdown of their role:

    - Structure from Chaos: LLMs natively output text (strings). But in programming, we often need structured data like lists, dictionaries, or database records. Output Parsers bridge this gap.

    - Two Main Jobs:
        - Formatting Instructions: They can inject instructions into the prompt telling the LLM generally how to format its response (e.g., "Return valid JSON").
        - Parsing: They take the final text response and convert it into a Python object (e.g., parsing a stringified JSON into a real Python dict).

- Common Types:
    - StrOutputParser: The simplest one. It just extracts the text content from the message object, converting AIMessage(content="Hello") $\to$ "Hello".
    - JsonOutputParser: Ensures the output is valid JSON and converts it into a Python Dictionary.
    - PydanticOutputParser: The most powerful one. It uses a Python class (Pydantic model) to define exactly what fields, data types, and validation rules logic the output must follow.


### Theory: StrOutputParser and Chaining

1. **StrOutputParser**: 
   - This is the simplest output parser in LangChain. 
   - LLMs often return complex objects (like `AIMessage` containing content, metadata, etc.). 
   - `StrOutputParser` extracts just the string content (the text body) from the response, converting it into a standard Python string. This makes it ready to be passed into the next step of a chain.

2. **Chain of Thought (Multi-step Logic)**:
   - The chain `template1 | model | parser | template2 | model | parser` demonstrates a pipeline.
   - **Step 1**: `template1` generates a prompt asking for a "detailed report".
   - **Step 2**: `model` generates the report (as an `AIMessage`).
   - **Step 3**: `parser` converts the `AIMessage` to a pure string.
   - **Step 4**: This string text is automatically passed as input to `template2` (filling the `{text}` variable) which asks for a summary.
   - **Step 5 & 6**: The model generates the summary, and the final parser cleans it up for printing.

This pattern is crucial for complex workflows where the output of one step becomes the input of the next.

In [1]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from dotenv import load_dotenv
load_dotenv()

llm = HuggingFaceEndpoint(
    repo_id="Qwen/Qwen2.5-7B-Instruct",
    task="text-generation",
    temperature=0.3)
model = ChatHuggingFace(llm=llm)

# 1. StrOutputParser example code-------

from langchain_core.output_parsers import StrOutputParser

# Prompt 1 → detailed report
template1 = PromptTemplate(
    template="""
You are a helpful AI tutor.
Write a beginner friendly detailed report on {topic}.
""",
    input_variables=["topic"]
)

# Prompt 2 → summary
template2 = PromptTemplate(
    template="""
Write a short point-wise summary of the following text:
{text}
""",
    input_variables=["text"]
)

parser = StrOutputParser()

chain = template1 | model | parser | template2 | model | parser

result = chain.invoke({"topic": "LLM"})
print(result)

### Point-wise Summary of the Beginner's Guide to Large Language Models (LLMs)

1. **Introduction**:
   - Large Language Models (LLMs) are AI models designed to process and generate human-like text.
   - They are trained on vast amounts of text data, enabling them to understand and generate text across multiple languages and contexts.
   - Applications include customer service, content creation, and language translation.

2. **Definition of LLMs**:
   - LLMs predict the next word in a sequence of text based on patterns learned from large datasets.
   - The goal is to generate coherent and contextually relevant text.

3. **Key Features of LLMs**:
   - **Massive Size**: Contain billions or trillions of parameters.
   - **Contextual Understanding**: Can understand the context of sentences and conversations.
   - **Multilingual Support**: Capable of processing and generating text in multiple languages.
   - **Fine-Tuning**: Can be adapted for specific tasks through fine-tuning.

4. **How L

### Theory: JsonOutputParser and Structure Enforcement

1. **JsonOutputParser**: 
   - LLMs natively generate text. Getting them to generate structured data (like JSON) can be tricky.
   - This parser does two things: 
      - Provide instructions to the model on how to format the JSON.
      - Parse the resulting string into a Python Dictionary `dict`.

2. **Injecting Instructions**:
   - `parser.get_format_instructions()` returns a string containing instructions like "Return a JSON object with keys..."
   - `partial_variables`: This feature allows us to pre-fill variables in the prompt template. Here, we inject `{format_instruction}` automatically so we don't have to type it out manually every time.
   - Note: The prompt explicitly includes `{format_instruction}` to ensure the model "sees" these rules.

In [2]:
# JsonOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from dotenv import load_dotenv

load_dotenv()

#model 
llm = HuggingFaceEndpoint(
    repo_id="MiniMaxAI/MiniMax-M2.1",
    task="text-generation")
model = ChatHuggingFace(llm = llm)

#OutputParser
parser = JsonOutputParser()

#template
template = PromptTemplate(
    template="give me a name, age, address of a fictional character.\n {format_instruction}",
    input_variables=[],
    partial_variables={"format_instruction": parser.get_format_instructions()}
)

chain = template | model | parser
result = chain.invoke({})
print(result)

{'name': 'Sarah Mitchell', 'age': 32, 'address': '742 Evergreen Terrace, Springfield, IL 62701'}


### Theory: PydanticOutputParser and Data Validation

1. **PydanticOutputParser**: 
   - This is the most robust parsing method available in LangChain for structured data.
   - It uses the **Pydantic** library, which allows you to define strict schemas (Blueprints) for your data.

2. **Schema Definition**:
   - `class Person(BaseModel)`: We define a class that represents the structure we want.
   - `Field(description=...)`: We provide descriptions for each field. The LLM uses these descriptions to understand *what* to put in that field.
   - **Validation**: Notice `age : int = Field(gt=18)`. This enforces rules (e.g., age must be greater than 18). If the LLM generates an age of 10, the parser can catch this error (and in advanced setups, auto-correction chains can ask the LLM to fix it).

3. **Type Safety**:
   - The output `result` is not just a dictionary, but an instance of the `Person` class (or equivalent object). This is critical for building reliable applications where you need to be 100% sure that the output data matches your software's expected types.

In [3]:
# PydenticOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from dotenv import load_dotenv

load_dotenv()

#model 
llm = HuggingFaceEndpoint(
    repo_id="MiniMaxAI/MiniMax-M2.1",
    task="text-generation")
model = ChatHuggingFace(llm = llm)

class Person(BaseModel):
    name: str = Field(description='Persons name')
    age : int = Field(gt=18, description="Age of Person")
    address : str = Field(description="Place where person belongs to")


parser = PydanticOutputParser(pydantic_object=Person)

template = PromptTemplate(
    template= "get me name, age, address of a fictional {type} Person.\n {format_instruction}",
    input_variables=["type"],
    partial_variables={"format_instruction": parser.get_format_instructions()}
)

chain = template | model | parser
result = chain.invoke({"type": "indian"})
print(result)

name='Rajiv Menon' age=28 address='24, Gandhi Nagar, Chennai, Tamil Nadu, India'
