# ✅ When/Why Do You Need `BaseModel`?

#### BaseModel is used when you expect **FastAPI to receive structured JSON data** in the **request(POST)** body


| Situation | Need `BaseModel`? | Use instead | Example |
|---|---|---|---|
| `POST` with **JSON data** | ✅ Yes | — | `{ "name": "Alice", "age": 30 }` |
| `POST` with **form data / files** | ❌ No | `UploadFile = File(...)` | Uploading an image; sending form fields |
| `GET` request with **query parameters** | ❌ No | `def example(q: str)` | Like `?text=hello` in the URL |


# ✅ When/Why Do I Need `@model_validator(mode='after')`? 

#### Special conditions that involve **two or more fields(data types) at the same time**, once they’ve already passed basic type checks.

| Example Name                         | Fields Involved                     | Rule Description                                              | When to Use |
|---------------------------------------|--------------------------------------|---------------------------------------------------------------|-------------|
| Password confirmation match           | `password`, `password_confirm`       | Both passwords must be exactly the same                       | Signup / account creation |
| End date after start date              | `start_date`, `end_date`              | End date must be later than start date                         | Scheduling / events |
| At least one contact                   | `email`, `phone`                      | Must provide at least one contact method                       | Forms / registration |
| Discount less than price               | `price`, `discount_price`             | Discount must be less than the original price                  | E-commerce / product listing |
| Min less than max                      | `min_value`, `max_value`              | Min must be strictly less than max                             | Data ranges / numeric settings |
| Coordinates validity                   | `latitude`, `longitude`               | Latitude between -90 and 90, longitude between -180 and 180    | Geo data validation |
| Mutually exclusive options             | `option_a`, `option_b`                 | Only one of these fields can be provided                       | Config / settings forms |


## 1. Password confirmation matches

In [None]:
from pydantic import BaseModel, model_validator, ValidationError

class UserSignup(BaseModel):
    password: str
    password_confirm: str

    @model_validator(mode='after')
    def passwords_match(cls, model):
        if model.password != model.password_confirm:
            raise ValueError("Passwords do not match")
        return model

print('-------Unmatched case--------')
try:
    UserSignup(password="secret123", password_confirm="wrong")
except ValidationError as e:
    print(e)

print('\n')
print('---------matched case---------')

try:
    UserSignup(password="secret123", password_confirm="secret123")
    print("password and confirmed-password matched!")
except ValidationError as e:
    print(e)

-------Unmatched case--------
1 validation error for UserSignup
  Value error, Passwords do not match [type=value_error, input_value={'password': 'secret123',...sword_confirm': 'wrong'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error


---------matched case---------
password and confirmed-password matched!


## 2. End date must be after start date

In [24]:
from datetime import date

class Event(BaseModel):
    start_date: date
    end_date: date

    @model_validator(mode='after')
    def check_dates(cls, model):
        if model.end_date <= model.start_date:
            raise ValueError("Invalidation : End date must be after start date")
        return model

print('-------Invalidation Case--------')
try:
    Event(start_date="2025-01-02", end_date="2025-01-01")
except ValidationError as e:
    print(e)

print('\n')
print('---------Validation Case----------')

try:
    Event(start_date="2025-01-01", end_date="2025-01-02")
    print("validation PASSED : End date is after start date, ")
except ValidationError as e:
    print(e)

-------Invalidation Case--------
1 validation error for Event
  Value error, Invalidation : End date must be after start date [type=value_error, input_value={'start_date': '2025-01-0...end_date': '2025-01-01'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error


---------Validation Case----------
validation PASSED : End date is after start date, 


## 3. Phone OR Email : One of two fields must be provided 

In [39]:
from pydantic import BaseModel, model_validator, ValidationError

class Contact(BaseModel):
    email: str | None = None
    phone: str | None = None

    @model_validator(mode='after')
    def require_one(cls, model):

        """
        cls : the class itself. It is like 'self'
        more explanation to 'cls'
        self → “this one dog” (an object) : model.email  # data for this one object
        cls → “the idea of dogs” (the blueprint/class) cls.__name__ # the name of the blueprint being used


        model : fully validated instance. whole object with all its fields available as attributes:
        examples -> model.email and model.phone

        """
        if not model.email and not model.phone:
            raise ValueError("You must provide either email or phone")
        else:
            print('Validation PASSED! either email or phone is provided')
            return model

# Whenever a model instance (e.g., Contact(...) or Contact.model_validate(data)) is created/validated,
# Pydantic runs and AUTOMATICALLY calls the validator at the right time.
# mode='after' : Pydantic first validates each field, builds the instance, then calls your function.

# ✅ Works — has email
c1 = Contact(email="test@example.com")
print("Contact 1:", c1)

# ✅ Works — has phone
# the method model_validate() takes a single argument, usually parse a DICT type!
c2 = Contact.model_validate({"phone": "123-456-7890"})
print("Contact 2:", c2)

# parse both fields as a dict type
c4 = Contact.model_validate({"phone": "123-456-7890", 'email':'xxxx@gmail.com'})
print("Contact 4:", c4)

print('--------------------------')
print('❌ Fails — neither provided')
try:
    c3 = Contact()
except ValidationError as e:
    print("Validation error:")
    print(e)


Validation PASSED! either email or phone is provided
Contact 1: email='test@example.com' phone=None
Validation PASSED! either email or phone is provided
Contact 2: email=None phone='123-456-7890'
Validation PASSED! either email or phone is provided
Contact 4: email='xxxx@gmail.com' phone='123-456-7890'
--------------------------
❌ Fails — neither provided
Validation error:
1 validation error for Contact
  Value error, You must provide either email or phone [type=value_error, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error


## 4. Discount price must be less than normal price

In [None]:
class Product(BaseModel):
    price: float
    discount_price: float | None = None

    @model_validator(mode='after')
    def check_discount(cls, model):

        # invalid case
        if model.discount_price is not None and model.discount_price >= model.price:
            raise ValueError("Discount must be less than the original price")

        # valid case
        elif model.discount_price is not None and model.discount_price <= model.price:
            print('Validation PASSED, discount_price is smaller than price')

        return model

print('-------Validation Case--------')
try:
    Product(price=10, discount_price=5)
except ValidationError as e:
    print(e)


print('--------------------------')
print('❌ Fails — neither provided')
try:
    Product(price=5, discount_price=10)
except ValidationError as e:
    print("Validation error:")
    print(e)


-------Validation Case--------
Validation PASSED, discount_price is smaller than price
--------------------------
❌ Fails — neither provided
Validation error:
1 validation error for Product
  Value error, Discount must be less than the original price [type=value_error, input_value={'price': 5, 'discount_price': 10}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error


## 4-1. Bonus Discount price must be less than normal price

In [None]:
from pydantic import BaseModel, field_validator, ValidationInfo

class Product(BaseModel):
    price: float
    discount_price: float | None = None

    @field_validator('discount_price', mode='after')
    def ensure_discount_is_valid(cls, discount_price, validated_field: ValidationInfo):

        """
        cls : it is like 'self'
        discount_price : value of the field being validated (in this case : discount_price)
        becasue it is specified as in @field_validator('discount_price', mode='after')

        validated_field : other fields that have already been validated are available inside
        """
        # Step 1: If discount_price is missing or None, skip the check
        if discount_price is None:
            return discount_price

        # Step 2: Get the "price" from the data already validated
        price = validated_field.data.get('price')

        # Step 3: Compare prices
        if price is not None and discount_price >= price:
            raise ValueError('discount_price must be < price')

        # Step 4: If everything is fine, return the value as-is
        return discount_price


## Why this is nicer? 
- The user (or API client) sees **exactly which field is wrong.**
- You don’t have to explain “the whole model is invalid” when the problem is just one field.


## 🔑 Key differences 
@field_validator("field_name") vs @model_validator(mode="after")

#### @field_validator("field_name")

- Runs on **one field only** (v = that field’s value).

- Can still peek at other fields via 'validated_field.data.get('price')'

- Best when the rule is mostly about that one field.

- **Example: “discount_price must be less than price”.**

#### @model_validator(mode="after")

- Runs after **all fields are validated.**

- Works on the whole object (model = the instance).

- Best when the rule needs to check relationships between multiple fields.

- **Example: “end_date must be after start_date”.**

Input Data  ─────────────►  Pydantic
                             │
                             ▼
                  ┌─────────────────────┐
                  │ Field validation    │
                  │ (one field at a     │
                  │ time)               │
                  └─────────────────────┘
                             │
        Each field value  --->│ v = field value
                             │ context.data = other fields
                             ▼
                 @field_validator("field_name")
                             │
                             ▼
            ✅ return value     OR     ❌ raise ValueError
                             │
                             ▼
                  ┌─────────────────────┐
                  │ Model validation    │
                  │ (whole object,      │
                  │ all fields ready)   │
                  └─────────────────────┘
                             │
            model = instance with all fields validated
                             ▼
                 @model_validator(mode="after")
                             │
                             ▼
            ✅ return model     OR     ❌ raise ValueError
                             │
                             ▼
                    Final Valid Model


# ⚙️ In Data Engineering

## Pydantic helps ensure data contracts are respected

#### 1. Data engineers deal with pipelines
#### 2. Streaming data
#### 3. APIs



Common uses:

#### 1. ETL / ELT pipelines

- Validate raw data before loading into a warehouse (Snowflake, BigQuery, etc.).

- Catch bad rows early → save downstream headaches.

In [None]:
class Transaction(BaseModel):
    id: str
    amount: float
    timestamp: str


#### 2. API ingestion

- When consuming REST APIs or Kafka streams, schemas often drift.

- Pydantic can validate messages so only “good” ones get processed.

### 3. Data contracts between teams

- Define shared models in Pydantic to make sure upstream and downstream teams agree on the format.

  - Example: An upstream service promises event_time is an ISO datetime — Pydantic enforces that.

#### 4.Configuration management

- Pipelines have lots of configs (file paths, connection strings).

- Pydantic’s BaseSettings is perfect for loading/validating environment variables.

# 🔬 In Data Science

## Pydantic helps by validating and cleaning inputs (messy sources like CSV, Excel, APIs, scrapers) before analysis

#### Common uses:

## 1. Dataset schemas

- Define what each row should look like (types, required/optional fields, ranges).

  - Example: validating that a “price” column has positive floats and “date” column has valid ISO strings.

In [None]:
from pydantic import BaseModel, Field

class Row(BaseModel):
    user_id: int
    age: int = Field(gt=0)
    purchase_amount: float
    date: str  # ISO date

# Validate each row before using it
clean = [Row(**r) for r in raw_data]


## 2. Feature engineering sanity checks

- Before training a model, validate that engineered features are in the expected ranges.

  - Example: probability must be between 0 and 1.

## 3. Config files for ML experiments

- Hyperparameters often come from YAML/JSON.

- Pydantic can validate them so you don’t waste a 10-hour run because of a typo.

# ⚠️ Why avoid global variables like adjusted_recipe?
 | Problem                        | Explanation                                                                                               |
| ------------------------------ | --------------------------------------------------------------------------------------------------------- |
| 🔁 Data overlap                | Multiple users using the app can overwrite each other's adjusted recipes                                  |
| 🧪 Debugging nightmares        | You can’t track which user triggered which recipe                                                         |
| 🚫 Not thread-safe             | FastAPI runs multiple requests in parallel — global variables cause race conditions                       |
| 🚀 Won’t scale with Docker/GCP | In production (like GCP Cloud Run), your app may run across multiple containers — they don’t share memory |
