You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I like the idea that I can define my required data quality and feature behaviour directly in Pandera schema classes. I.e. I want to define all the logic for how my features should behave in a schema class. E.g. I want to be able to define a schema class that is able to separate its attributes by how they were defined. An example:
class TestSchema(pa.DataFrameModel):
# Feature group 1
feature_1_g1: Series[int] = pa.Field()
feature_2_g1: Series[int] = pa.Field()
# Feature group 2
feature_1_g2: Series[int] = pa.Field()
feature_2_g2: Series[int] = pa.Field()
where I would to be able to separate these features without hardcoding their names. It is not always that the grouping can be inferred directly from e.g. the annotation.
Describe the solution you'd like
One way to solve this would be to be able to define a schema class by e.g.
class FeatureGroup1Series(Series, Generic[GenericDtype]):
pass
class FeatureGroup2Series(Series, Generic[GenericDtype]):
pass
class TestSchema(pa.DataFrameModel):
# Feature group 1
feature_1_g1: FeatureGroup1Series[int] = pa.Field()
feature_2_g1: FeatureGroup1Series[int] = pa.Field()
# Feature group 2
feature_1_g2: FeatureGroup2Series[int] = pa.Field()
feature_2_g2: FeatureGroup2Series[int] = pa.Field()
Then it should be simple to separate the different attributes by looking at the annotations.
Though the current problem with this solution is that TestSchema.to_schema() would not work anymore as it requires that all columns should be annotated with Series[T] and not a subclass to Series.
Describe alternatives you've considered
I've considered adding more information to e.g. a field object, but have not come up with a good solution.
Additional context
I'll put up a PR with a potential solution.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I like the idea that I can define my required data quality and feature behaviour directly in Pandera schema classes. I.e. I want to define all the logic for how my features should behave in a schema class. E.g. I want to be able to define a schema class that is able to separate its attributes by how they were defined. An example:
where I would to be able to separate these features without hardcoding their names. It is not always that the grouping can be inferred directly from e.g. the annotation.
Describe the solution you'd like
One way to solve this would be to be able to define a schema class by e.g.
Then it should be simple to separate the different attributes by looking at the annotations.
Though the current problem with this solution is that TestSchema.to_schema() would not work anymore as it requires that all columns should be annotated with
Series[T]
and not a subclass toSeries
.Describe alternatives you've considered
I've considered adding more information to e.g. a field object, but have not come up with a good solution.
Additional context
I'll put up a PR with a potential solution.
The text was updated successfully, but these errors were encountered: