You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pandera DataFrameModels do not support parameterized types for polars, while DataFrameSchemas do.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the master branch of pandera.
Example
Here is an example of a working DataFrameSchema and several variations of broken DataFrameModels.
fromtypingimportAnnotatedimportpandera.polarsaspaimportpolarsasplfrompandera.typingimportSeriesfrompandera.errorsimportSchemaInitErrordf=pl.DataFrame({
"id": [1, 2, 3],
"lists": [["a"], ["a", "b"], ["a", "b", "c"]],
})
# works!schema=pa.DataFrameSchema(
columns={
"id": pa.Column(int),
"lists": pa.Column(list[str]),
}
)
schema.validate(df)
print("DataFrameSchema validation passed")
classLists(pa.DataFrameModel):
"""Most basic, expected form given the working schema above."""id: intlists: list[str]
try:
Lists.validate(df)
exceptSchemaInitErrorase:
print("\nLists validation failed")
print(e)
else:
print("\nLists validation passed")
classListsSeries(pa.DataFrameModel):
"""Using series as a wrapper around basic data types like the id column here will not work. Examples of this appear in the pandera documentation. https://pandera.readthedocs.io/en/latest/dataframe_models.html#dtype-aliases """id: Series[int]
lists: Series[list[str]]
try:
ListsSeries.validate(df)
exceptSchemaInitErrorase:
print("\nListsSeries validation failed")
print(e)
else:
print("\nListsSeries validation passed")
classAlternateListsSeries(pa.DataFrameModel):
"""Demonstrating using Series as a type wrapper around only lists to avoid the initialization error on id."""id: intlists: Series[list[str]]
try:
AlternateListsSeries.validate(df)
exceptSchemaInitErrorase:
print("\nAlternateListsSeries validation failed")
print(e)
else:
print("\nAlternateListsSeries validation passed")
classListsAnnotated(pa.DataFrameModel):
"""Parameterized form using Annotated as suggested at https://pandera.readthedocs.io/en/latest/polars.html#nested-types """id: intlists: Series[Annotated[list, str]]
try:
ListsAnnotated.validate(df)
exceptTypeErrorase:
print("\nListsAnnotated validation failed")
print(e)
else:
print("\nListsAnnotated validation passed")
classListsAnnotatedStr(pa.DataFrameModel):
"""Alternate parameterized form using Annotated as seen in the examples here: https://pandera.readthedocs.io/en/latest/dataframe_models.html#annotated """id: intlists: Series[Annotated[list, "str"]]
try:
ListsAnnotatedStr.validate(df)
exceptTypeErrorase:
print("\nListsAnnotatedStr validation failed")
print(e)
else:
print("\nListsAnnotatedStr validation passed")
When run with the following python / library versions:
Thanks for reporting this @r-bar FYI Series[Type] annotations is currently not supported in the polars API, see #1588 and ongoing discussion here: #1594.
Looking into this, planning on supporting:
classLists(pa.DataFrameModel):
"""Most basic, expected form given the working schema above."""id: intlists: list[str]
Description
Pandera DataFrameModels do not support parameterized types for polars, while DataFrameSchemas do.
Example
Here is an example of a working DataFrameSchema and several variations of broken DataFrameModels.
When run with the following python / library versions:
the above script produces:
Expected behavior
I would expect any column types that are valid to pass to DataFrameSchema's constructor to also be valid as annotations for DataFrameModel.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: