Replies: 3 comments
-
Is there an easier way to exclude other defined columns from a regex column? If I change the It also matches the 'some_col' and 'another_col' columns and raises an error for them:
Maybe I am missing a way to use data frame checks to check all columns but "some_col", "another_col" or whatever other column is defined already in the model in other cases without hardcoding the col names in the regex or getting them from |
Beta Was this translation helpful? Give feedback.
-
I found a workaround with a custom data frame check that does the job, but it also seems like a hacky solution:
Maybe there is still a better way to handle "other columns"? |
Beta Was this translation helpful? Give feedback.
-
hi @SebbanSms, good question! currently I'd recommend your first approach; here's a slightly modified version that exclude private vars: class Model(pa.SchemaModel):
""" Example Model """
some_col: Series[int] = pa.Field(coerce=True)
another_col: Series[int] = pa.Field(coerce=True)
all_other_cols: Series[float] = pa.Field(
gt=0,
coerce=True,
alias=rf"^(?!{'|'.join(k for k in __annotations__ if not k.startswith('_'))}).*$",
regex=True,
)
class Config:
strict = True In looking into other solutions I realized that the class Model(pa.SchemaModel):
""" Example Model """
some_col: Series[int] = pa.Field(coerce=True)
another_col: Series[int] = pa.Field(coerce=True)
undefined_col: Series[Any] = pa.Field(alias=".*", regex=True)
class Config:
strict = True
@pa.check(".*")
def undefined_column(cls, series: pd.Series) -> pd.Series:
if series.name in cls.__fields__.keys():
return True
return series > 0 Depending on the demand for a use case like this we might consider adding first class support for a notion of "apply these checks to undefined columns" or "exclude these patterns from regex", but for now I think these two solutions should suffice. |
Beta Was this translation helpful? Give feedback.
-
I need to define a Schema Model where I expect some columns to have defined names and defined checks and all other columns should have different checks.
So far, I only made it work with some regex hack:
What is the most convenient way to apply some checks to "all other column" in the data frame but the explicitly defined ones?
Beta Was this translation helpful? Give feedback.
All reactions