Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantiles_df method speed optimization with wrap class of Quantile_timeseries #1351

Merged
merged 8 commits into from
Nov 17, 2022

Conversation

tranquilitysmile
Copy link
Contributor

@tranquilitysmile tranquilitysmile commented Nov 11, 2022

Trying to optimization of estimate Quantiles_df method, with wraping Quantile_timeseries to pd.dataframe

For saving older method, added toggle "fast_mode", by default using older method with False

With this optimization, process speed up to ~80-450%+ of time(depends on size of dataframe).

Example on 5 minutes resolution data, forecasted one week in to future.
On big datasets time for estimation extremely increased

photo_2022-11-11_19-08-02

Fix mistake with bool toggle
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving this a shot, it looks interesting. We have to get rid of quantile_timeseries_to_df() as it's basically a copy of quantile_timeseries() (I made a proposal - let me know).

Two more things:

  • Could you check whether the results are the same (modulo some numeric rounding)? If yes, I don't see any reason for not removing fast_mode (and basically always have it to True). If there are differences, we have another problem :) Either way, I think we should remove fast_mode and always use whichever correct method is the fastest.
  • Could you add some unit tests for this method in test_timeseries.py ? I think we're missing them currently... that would be a perfect occasion.

# TODO: there might be a slightly more efficient way to do it for several quantiles at once with xarray...
return pd.concat([self.quantile_df(quantile) for quantile in quantiles], axis=1)
if fast_mode == True:
return pd.concat([self.quantile_timeseries_to_df(quantile) for quantile in quantiles], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand well, quantile_timeseries_to_df() is exactly the same method as quantile_timeseries(), save for the dataframe translation at the end. Couldn't we simply do:

Suggested change
return pd.concat([self.quantile_timeseries_to_df(quantile) for quantile in quantiles], axis=1)
return pd.concat([self.quantile_timeseries(quantile).pd_dataframe() for quantile in quantiles], axis=1)

Or am I missing something?

@hrzn
Copy link
Contributor

hrzn commented Nov 13, 2022

Thanks for giving this a shot, it looks interesting. We have to get rid of quantile_timeseries_to_df() as it's basically a copy of quantile_timeseries() (I made a proposal - let me know).

Two more things:

  • Could you check whether the results are the same (modulo some numeric rounding)? If yes, I don't see any reason for not removing fast_mode (and basically always have it to True). If there are differences, we have another problem :) Either way, I think we should remove fast_mode and always use whichever correct method is the fastest.
  • Could you add some unit tests for this method in test_timeseries.py ? I think we're missing them currently... that would be a perfect occasion.

@tranquilitysmile could you also fix the linting issues? In case of doubt, check here.

@tranquilitysmile
Copy link
Contributor Author

tranquilitysmile commented Nov 14, 2022

Hi!

Could you check whether the results are the same (modulo some numeric rounding)? If yes, I don't see any reason for not removing fast_mode (and basically always have it to True). If there are differences, we have another problem :) Either way, I think we should remove fast_mode and always use whichever correct method is the fastest.

I couldn't find any difference in numeric rounding in result dataframes. Thats why i follow you advice and remove fast_mode toggle, and method quantile_timeseries_to_df().

If I understand well, quantile_timeseries_to_df() is exactly the same method as quantile_timeseries(), save for the dataframe translation at the end. Couldn't we simply do

One small thing is different, it's naming of resulting column in quantile_timeseries(), which passes the static name "{comp}quantiles", i modify it to return "{comp}{quantile}" to avoid identical name of resulting columns in quantiles_df()

@tranquilitysmile could you also fix the linting issues? In case of doubt, check here.

if I did everything right the problem should be solved

Could you add some unit tests for this method in test_timeseries.py ? I think we're missing them currently... that would be a perfect occasion.

I don't have experience writing unit tests right now, I'll try to figure it out soon

@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2022

Codecov Report

Base: 93.87% // Head: 93.87% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (7729d4b) compared to base (16454ef).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1351   +/-   ##
=======================================
  Coverage   93.87%   93.87%           
=======================================
  Files          78       78           
  Lines        8523     8511   -12     
=======================================
- Hits         8001     7990   -11     
+ Misses        522      521    -1     
Impacted Files Coverage Δ
darts/timeseries.py 92.44% <100.00%> (+0.03%) ⬆️
...arts/models/forecasting/torch_forecasting_model.py 87.08% <0.00%> (-0.06%) ⬇️
darts/models/forecasting/block_rnn_model.py 98.24% <0.00%> (-0.04%) ⬇️
darts/models/forecasting/nhits.py 99.27% <0.00%> (-0.01%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @tranquilitysmile ! Do you think you could still add a unit test?

@tranquilitysmile
Copy link
Contributor Author

Add unit test for the method quantiles_df, please check the correctness

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@hrzn hrzn merged commit ec78594 into unit8co:master Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants