Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support SchemaPipelines #162

Closed
cosmicBboy opened this issue Jan 15, 2020 · 1 comment
Closed

support SchemaPipelines #162

cosmicBboy opened this issue Jan 15, 2020 · 1 comment

Comments

@cosmicBboy
Copy link
Collaborator

Schema Pipeline

This would be a convenience class for using an in-memory object that contains a registry of transformations that are expected to occur to a data frame. The example above would look something like:

schema_pipeline = (
    SchemaPipeline(
        base_schema=DataFrameSchema({
            "col1": Column(Int, Check(lambda s: s >= 0)),
            "col2": Column(Int, Check(lambda s: s >= 0)),
    })
    .pipe("combine", lambda schema: schema.add_columns({
        "col3": Column(Int, Check(lambda s: s >= 0))
    }))
    .pipe("remove", lambda schema: schema.remove_columns(["col1"]))
)

df = pd.DataFrame({
    "col1": [1, 2, 3],
    "col2": [1, 2, 3],
})
df = schema_pipeline["base_schema"].validate(df)

df["col3"] = df["col1"] + df["col2"]
df = schema_pipeline["combine"].validate(df)

del df["col1"]
df = schema_pipeline["remove"].validate(df)
@cosmicBboy
Copy link
Collaborator Author

closing this issue for now, can re-visit later if it still seems like a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant