add support for dataframe schema transformations: add_column, remove_column #6

cosmicBboy · 2018-11-19T02:50:01Z

these should be methods that correspond with pandas dataframe operations.

For example, if the user adds a column to a dataframe, also support changing the corresponding schema to account for that change:

df = pd.DataFrame({"a": [1, 2, 3]})

schema = DataFrameSchema([Column("a", PandasDtype.Int)])
df = schema.validate(df)

# add a column to the dataframe
df["b"] = ["x", "y", "z"]

# add column to the dataframe schema
schema = schema.add_column(Column("b", PandasDtype.String))
df = schema.validate(df)

# same with removing columns
df = df.dropna("a", axis=1)
schema = schema.remove_column("a")

df = schema.validate(df)


# or reflecting changes in an existing column
df["a"] = df["a"].astype(float)
schema = schema.change_column(Column("a", PandasDtype.Float))

df = schema.validate(df)

The text was updated successfully, but these errors were encountered:

cosmicBboy · 2019-05-01T00:43:36Z

this may obfuscate the code and be counter-productive to the entire point of pandera, which is to make the code more readable.

Feature pyspark backend

cosmicBboy mentioned this issue Nov 21, 2018

add support for schema merging #7

Closed

cosmicBboy closed this as completed May 1, 2019

cosmicBboy pushed a commit that referenced this issue May 12, 2023

Merge pull request #6 from NeerajMalhotra-QB/feature_pyspark_backend

0778e36

Feature pyspark backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for dataframe schema transformations: add_column, remove_column #6

add support for dataframe schema transformations: add_column, remove_column #6

cosmicBboy commented Nov 19, 2018

cosmicBboy commented May 1, 2019

add support for dataframe schema transformations: add_column, remove_column #6

add support for dataframe schema transformations: add_column, remove_column #6

Comments

cosmicBboy commented Nov 19, 2018

cosmicBboy commented May 1, 2019