# `pandantic` v1- Solving the issue of black box `DataFrames` with `pydantic`

# Background

In recent years I grown to favor a "walled garden" approach to writing Python applications. The concept is simple: instead of type-checking values with conditional if/else logic across my program, focus only on strict validation of data entry points, allowing the rest of the code to operate safely without tedium. 
The in-depth validation control `pydantic` offers has made this approach more powerful than ever before…that is, if you are working with JSON or dBase data that gets stored in-memory as a `dict`. However, in the Python data world `DataFrame` objects have become the nearly ubiquitous in-memory representation of data for a variety of convenience and efficiency reasons. This created a conflict between the pydantic model focused code I would write, and the black-box-esque `DataFrame` objects I would also have to utilize.


While this is the status quo for most, I began resenting it for a few reasons:
You have no way of knowing by looking at code whether a column exists in a `DataFrame` without statements like `assert hasattr(df, "my_column")`. This makes reasoning in an unfamiliar repo slower.


Some anomaly in the data would cause pandas to infer a data type incorrectly (i.e., object instead of int64 ), which goes uncaught but causes bugs downstream.
There are invalid values that cannot be identified by data type alone. For example, you could have a float column that represents values along a sin wave from -1 to 1. While the data type may be correct, you would need to use `df.loc[(df.my_column >= -1) & (df.my_column <= 1)]` to sub-select only valid rows.


These issues at a small scale are easy enough to remedy. However, when custom logic became needed for many columns, I would feel forced to choose between safe/verbose vs risky/simple. I found myself wishing there was an easy way to re-use my `pydantic` models for my `pandas` code…enter `pandantic`, which allows you to do exactly that!
After some experimenting, I concluded that some additional features and refinement would make me ready to use pandantic in much of my production code. The repo's maintainer Wessel Huising got in contacted and teamed up to make these improvements and publish a proper Pandantic V1 release which is available today on PyPi.
Not quite convinced? Well let's walk through a couple illustrative examples. Note that both examples are apart of a notebook here.

## Setup for examples

In [1]:
import pydantic
import pandantic

## Example 1: filter invalid rows

## Example 2: validate upon iteration with the `pandas` plugin