-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantiles_df method speed optimization with wrap class of Quantile_timeseries #1351
Conversation
Fix mistake with bool toggle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving this a shot, it looks interesting. We have to get rid of quantile_timeseries_to_df()
as it's basically a copy of quantile_timeseries()
(I made a proposal - let me know).
Two more things:
- Could you check whether the results are the same (modulo some numeric rounding)? If yes, I don't see any reason for not removing
fast_mode
(and basically always have it to True). If there are differences, we have another problem :) Either way, I think we should removefast_mode
and always use whichever correct method is the fastest. - Could you add some unit tests for this method in
test_timeseries.py
? I think we're missing them currently... that would be a perfect occasion.
darts/timeseries.py
Outdated
# TODO: there might be a slightly more efficient way to do it for several quantiles at once with xarray... | ||
return pd.concat([self.quantile_df(quantile) for quantile in quantiles], axis=1) | ||
if fast_mode == True: | ||
return pd.concat([self.quantile_timeseries_to_df(quantile) for quantile in quantiles], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand well, quantile_timeseries_to_df()
is exactly the same method as quantile_timeseries()
, save for the dataframe translation at the end. Couldn't we simply do:
return pd.concat([self.quantile_timeseries_to_df(quantile) for quantile in quantiles], axis=1) | |
return pd.concat([self.quantile_timeseries(quantile).pd_dataframe() for quantile in quantiles], axis=1) |
Or am I missing something?
@tranquilitysmile could you also fix the linting issues? In case of doubt, check here. |
Hi!
I couldn't find any difference in numeric rounding in result dataframes. Thats why i follow you advice and remove fast_mode toggle, and method quantile_timeseries_to_df().
One small thing is different, it's naming of resulting column in quantile_timeseries(), which passes the static name "{comp}quantiles", i modify it to return "{comp}{quantile}" to avoid identical name of resulting columns in quantiles_df()
if I did everything right the problem should be solved
I don't have experience writing unit tests right now, I'll try to figure it out soon |
Codecov ReportBase: 93.87% // Head: 93.87% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1351 +/- ##
=======================================
Coverage 93.87% 93.87%
=======================================
Files 78 78
Lines 8523 8511 -12
=======================================
- Hits 8001 7990 -11
+ Misses 522 521 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @tranquilitysmile ! Do you think you could still add a unit test?
4031a52
to
411e1be
Compare
Add unit test for the method quantiles_df, please check the correctness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Trying to optimization of estimate Quantiles_df method, with wraping Quantile_timeseries to pd.dataframe
For saving older method, added toggle "fast_mode", by default using older method with False
With this optimization, process speed up to ~80-450%+ of time(depends on size of dataframe).
Example on 5 minutes resolution data, forecasted one week in to future.
On big datasets time for estimation extremely increased