-
-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing from_record mehtod wich resturns DataFrame[Schema] #850
Comments
another improvement might be to have a typed 'record' constructor which matches the columns, but i'm not sure how you can make the IDE pick this up. |
I'm open to supporting this! Basically the pandera Let me know if you have the capacity to make a PR for this! would be happy to help guide further. In the meantime, a workaround would be something like: pa.typing.DataFrame[Schema](pd.DataFrame.from_record(...)) |
I made a workaround but it isn't ideal, since the schemas by default don't keep the names of the indexes.
OffTopic:
|
I'm not quite clear on your use case here... would you mind elaborating on that and why you need strictly typed dataframes?
You can supply
This is a known limitation of pandera... we haven't yet explored ways of modifying the
Can you provide a minimally reproducible code snippet for this in a bug issue? This test makes sure validation of columns does indeed happen |
Yes, i like typed data frames, because it is really good for documenting the code so you don't make any errors in column names or type errors. it also catches a lot of problems in case of missing data. Another usage i made out of it is to use it as a definition of my xlsx report output. I use the title in the field to actually set the title in the xlsx report output. and use reflection on the schema to get the right columns for serialization. This makes it very easy to change the order of columns in the output format and change the output itself. In case of a missing column the code would fail at the function that calculates the data instead of having to manually check the output file. for example: # Just extends the SchemaModel
class MonthlySummary(SchemaModelXlsx):
@classmethod
@property
def sheet_name(cls) -> str: #this could be title in the config instead
return "Summary"
month: pat.Index[pat.DateTime] = pa.Field(check_name=True, title="Month")
bruto_revenue: pat.Series[float] = pa.Field(title="Bruto revenue")
expenses: pat.Series[float] = pa.Field(title="expenses")
netto_revenue: pat.Series[float] = pa.Field(title="Netto revenue")
def revenue(sales: pat.DataFrame[ProductSales], services: pat.DataFrame[ServiceSales]) -> pat.DataFrame[Revenue]:
pass
def montly_summary(bruto_revenue: pat.DataFrame[Revenue], expenses_per_day: pat.DataFrame[Expenses] ):
# reindexes and then makes a difference between the different types of revenue and expenses.
return pat.DataFrame[MonthlySummary](
{
"bruto_revenue": total_revenue
"expenses": total_expenses
"netto_revenue": total_revenue - total_expenses
}
) ideally a typed api dataframe should have a constructor so you could call # each field should be typed to make construction easy
MonthlySummary(month, bruto_reveneu, espenses, netto_revenue)
or if you want to extract it from a df, this could drop the 'unstated columns' to enforce you to not just add some data.
MonthlySummary.from_df(df) having this specialized types could also add the opportunity to add methods and properties to the data frames to make calculating aggregated data easy with the defined types.
Looking at the test it doesn't check for missing columns i'll try to spend some time today to make an sample to double check the problem. |
Oke it is only the class MonthlySummary2(SchemaModel):
month: pat.Index[int] = pa.Field(check_name=True, title="Month")
bruto_revenue: pat.Series[float] = pa.Field(title="Bruto revenue")
expenses: pat.Series[float] = pa.Field(title="expenses")
netto_revenue: pat.Series[float] = pa.Field(title="Netto revenue")
df = pat.DataFrame[MonthlySummary2].from_records(
[
{
"month": 1,
"bruto_revenue": 1.0,
"expenses": 2.0
}
],
index=["month"]
) |
Hi @borissmidt
Yeah if you can send a code snippet (or maybe a PR 🙂) to update that test that would be great!
I'm down to support this use case, but I'm currently working on other stuff (#381) so if you'd like to own that part of the codebase I can help review changes and get them merged into the core library. |
I will spend some time to make a PR.
…On Tue, 10 May 2022, 03:24 Niels Bantilan, ***@***.***> wrote:
Hi @borissmidt <https://github.com/borissmidt>
Looking at the test it doesn't check for missing columns i'll try to spend
some time today to make an sample to double check the problem.
Yeah if you can send a code snippet (or maybe a PR 🙂) to update that
test that would be great!
Oke it is only the from_records that doesn't do any checks. But i only use
it in my unit tests.
I'm down to support this use case, but I'm currently working on other
stuff (#381 <#381>) so if
you'd like to own that part of the codebase I can help review changes and
get them merged into the core library.
—
Reply to this email directly, view it on GitHub
<#850 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNXZFRJPRUIT3EN3R4PN5TVJG3GPANCNFSM5VHQG4UQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
* Add a from record that checks the schema for a pandas dataframe * Add a from record that checks the schema for a pandas dataframe * handle nox session.install issue * fix lint * fix noxfile issue * remove unneeded types * update type annotation Co-authored-by: cosmicBboy <niels.bantilan@gmail.com>
* Add a from record that checks the schema for a pandas dataframe * Add a from record that checks the schema for a pandas dataframe * handle nox session.install issue * fix lint * fix noxfile issue * remove unneeded types * update type annotation Co-authored-by: cosmicBboy <niels.bantilan@gmail.com>
fixed by #859 |
Is your feature request related to a problem? Please describe.
When i do
panderas.typing.DataFrame[T].from_record()
i get an untyped DataFrame back and not a panderasDataFrame[T]
Describe the solution you'd like
A from_record method which allows you to create a typed dataframe this is especially usefull for writing unit tests.
The text was updated successfully, but these errors were encountered: