#### Pydantic Basics: Creating and Using Models
Pydantic models are the foundation of data validation in Python. They use Python type annotations to define the structure and validate data at runtime. Here's a detailed exploration of basic model creation with several examples.

In [2]:
## dataclass decorator automatically generates special methods
# __init__(), __repr__(), __eq__(), and __hash__() etc
from dataclasses import dataclass

@dataclass
class Person():
    name:str
    age:int
    city:str
    
person = Person(name="John Doe", age=30, city=35)
print(person)

Person(name='John Doe', age=30, city=35)


Even though in our class Person we defined city to be of str tyle there is no validation on it and therefore if we provide an integer it will accept that.
Though pydantic BaseModel validate the datatype.

In [3]:
from pydantic import BaseModel

class PersonModel(BaseModel):
    name: str
    age: int
    city: str

# Pydantic will validate the data types
# and raise an error if they do not match    
person1 = PersonModel(name="John Doe", age=30, city="Kelowna")
print(person1,'\n')

try:
    person2 = PersonModel(name="John Doe", age="thirty", city=35)
except ValueError as e:
    print(f"Error: {e}")

name='John Doe' age=30 city='Kelowna' 

Error: 2 validation errors for PersonModel
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='thirty', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/int_parsing
city
  Input should be a valid string [type=string_type, input_value=35, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/string_type


#### 2. Model with Optional Fields
Add optional fields using Python's Optional type:

In [4]:
from typing import Optional
class Employee(BaseModel):
    id: int
    name: str
    depratment: str
    salary: Optional[float] = None #optional field with default value
    is_active: Optional[bool] = True #optional field with default value
    
employee = Employee(id=1, name="John Doe", depratment="HR")
print(employee)
employee1 = Employee(id=2, name="Jane Doe", depratment="IT", salary=50000.0)
print(employee1)


id=1 name='John Doe' depratment='HR' salary=None is_active=True
id=2 name='Jane Doe' depratment='IT' salary=50000.0 is_active=True


Definition:
- Optional[type]: Indicates the field can be None

- Default value (= None or = True): Makes the field optional

- Required fields must still be provided

- Pydantic validates types even for optional fields when values are provided

In [5]:
from typing import List

class Classroom(BaseModel):
    room_no: str
    students: List[str]
    capacity: int
    
classroom = Classroom(room_no="A101", students=("Ash","Shanu","Krish"), capacity=40)
print(classroom)

room_no='A101' students=['Ash', 'Shanu', 'Krish'] capacity=40


#### 4. Model with Nested Models
Create complex structures with nested models:

In [6]:
class Address(BaseModel):
    street: str
    city: str
    zip_code : str
    
class Customer(BaseModel):
    customer_id: int
    name: str
    address: Address ## nested model
    
address1 = Address(street="123 main st", city="Kelowna", zip_code = "V1Y 1A1")
customer = Customer(customer_id = 1, name= "Ash", address=address1)
print(customer)

customer_id=1 name='Ash' address=Address(street='123 main st', city='Kelowna', zip_code='V1Y 1A1')


#### Pydantic Fields: Customization and Constraints

The Field function in Pydantic enhances model fields beyond basic type hints by allowing you to specify validation rules, default values, aliases, and more. Here's a comprehensive tutorial with examples.

Now let's take some examples:

Assume we have an output from LLM where we are getting name, age and email. We need to validate the data structure. Without pydantic we have to write if else conditions to check for multiple validations as below.

In [5]:
## We will check for name, age and email in the JSON output
import json
import re

def process_json_output_manual(json_output_string):
    try:
        data = json.loads(json_output_string)
        if not isinstance(data.get('name'),str):
            raise ValueError("Name is missing or not a string")
        if not isinstance(data.get('age'),int):
            ## In case instead of 30 it returns "30". Then we need to manually convert it.
            data['age'] = int(data['age'])
            ## In case LLM age which is out of range
        if not (18 <= data['age'] <= 99):
            raise ValueError("Age must be between 18 and 99")
        if not isinstance(data.get('email'),str):
            raise ValueError("Email is missing or not a string")
        if not re.match(r"[^@]+@[^@]+\.[^@]+", data.get('email')):
            raise ValueError("Email is not valid")
            

        ## Manual Validations
        #if not isinstance(data.)
    except (json.JSONDecodeError,ValueError,KeyError,TypeError) as e:
        print(f'Error:{e}')
    return None
        

In [11]:
process_json_output_manual('''{
    "name":"Ash",
    "age":30,
    "email":"ashumail.com"}
''')

# This approach is brittle and error-prone for LLM outputs

Error:Email is not valid


In [9]:
from pydantic import BaseModel, Field
from typing import Optional, Literal

## Define the structured data we expect from the LLM or as tool input
class UserProfile(BaseModel):
    name:str = Field(..., description="The name of the user")
    age: int = Field(...,ge=18, le=99, description="The age of the user must be between 18 and 99")
    email: str = Field(...,pattern=r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$", description="The user's email address.")
    status: Literal["active","inactive"] = Field("active", description="The status of the user, either 'active' or 'inactive'.")
    notes: Optional[str] = Field(None, description="Any additional notes about the user.")
    

    

In [None]:
## Scenatrio 1: LLM generates a valid JSON output/ Valid Tool Input
valid_json_output = {
    "name":"Ash",
    "age":30,
    "email":"ashu@gmail.com",
    "status":"active"
}


try:
    ## adding ** to unpack the dictionary into keyword arguments
    user_profile = UserProfile(**valid_json_output)
    print(f"Successfully parsed data (LLM output or Tool Input):")
    print(user_profile)
except Exception as e:
    print(f"Error: {e}")

Successfully parsed data (LLM output or Tool Input):
name='Ash' age=30 email='ashu@gmail.com' status='active' notes=None


In [19]:
## Scenario2: LLM generates an invalid JSON output/ Invalid Tool Input
invalid_json_output = {
    "name":"Ash",
    "age":17,  # Invalid age
    "email":"ashumail.com",  # Invalid email format
    "status":"active"
}

try:
    user_profile_invalid = UserProfile(**invalid_json_output)
except Exception as e:
    print(f'\n Error parsing data : {e}')


 Error parsing data : 2 validation errors for UserProfile
age
  Input should be greater than or equal to 18 [type=greater_than_equal, input_value=17, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/greater_than_equal
email
  String should match pattern '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' [type=string_pattern_mismatch, input_value='ashumail.com', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/string_pattern_mismatch


In [20]:
## Scenario 3: LLM generates a missing required field JSON output/ Invalid Tool Input
malformed_json_output = {
    "name":"Ash",
    "email":"ashu@gmail.com",  # Invalid email format
    "status":"active"
}

try:
    user_profile_missingdata = UserProfile(**malformed_json_output)
except Exception as e:
    print(f'\n Error parsing data : {e}')


 Error parsing data : 1 validation error for UserProfile
age
  Field required [type=missing, input_value={'name': 'Ash', 'email': ...om', 'status': 'active'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
