Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new datasets to darts.dataset #1298

Merged
merged 25 commits into from
Oct 28, 2022
Merged

Conversation

FEJTWOW
Copy link
Contributor

@FEJTWOW FEJTWOW commented Oct 18, 2022

Fixes #617.

Summary

  • Add new dataset to dataset catalog
  • Specify the class in darts.datasets.__init__.py to correctly load new dataset

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2022

Codecov Report

Base: 93.84% // Head: 93.83% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (bfb54e9) compared to base (d466815).
Patch coverage: 20.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1298      +/-   ##
==========================================
- Coverage   93.84%   93.83%   -0.01%     
==========================================
  Files          78       78              
  Lines        8500     8488      -12     
==========================================
- Hits         7977     7965      -12     
  Misses        523      523              
Impacted Files Coverage Δ
darts/datasets/dataset_loaders.py 91.48% <20.00%> (ø)
darts/timeseries.py 92.29% <0.00%> (-0.06%) ⬇️
...arts/models/forecasting/torch_forecasting_model.py 87.08% <0.00%> (-0.06%) ⬇️
darts/models/forecasting/block_rnn_model.py 98.24% <0.00%> (-0.04%) ⬇️
darts/models/forecasting/nhits.py 99.27% <0.00%> (-0.01%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gdevos010
Copy link
Contributor

@FEJTWOW Thanks for adding a new dataset. I know many other repos have used this dataset for benchmarking. I believe Autoformer started using it. Are the time ranges the same? And if there not, can we add an arg to match it?

@FEJTWOW
Copy link
Contributor Author

FEJTWOW commented Oct 19, 2022

@FEJTWOW Thanks for adding a new dataset. I know many other repos have used this dataset for benchmarking. I believe Autoformer started using it. Are the time ranges the same? And if there not, can we add an arg to match it?

@gdevos010 Can you specify what do you mean by time ranges. Do you want to specify the start date and end date or do you want to have an arg to specify the day of the week? For instance right now I've assumed that the data was updated every week on Sunday at 00:00:00, but it is possible to change the day while parsing.

I've checked this dataset and their data covers time from 2002-01-01 to 2020-06-30. The dataset that I've just added covers data from 1997-10-12 to 2022-10-09. If you want to change your start day you can always do TimeSeries.drop_before()

@FEJTWOW FEJTWOW marked this pull request as ready for review October 19, 2022 11:41
@FEJTWOW FEJTWOW force-pushed the Improvement/Add_new_datasets_617 branch from 13bc38d to f84b344 Compare October 19, 2022 11:52
@FEJTWOW FEJTWOW marked this pull request as draft October 19, 2022 14:05
@hrzn
Copy link
Contributor

hrzn commented Oct 21, 2022

@FEJTWOW Thanks for adding a new dataset. I know many other repos have used this dataset for benchmarking. I believe Autoformer started using it. Are the time ranges the same? And if there not, can we add an arg to match it?

I think it'd be overkill to add supporting time ranges here, as IMO it would mix concerns quite a bit.
Users could just do for instance: my_ts = WeatherDataset().load()[pd.Timestamp("2020-06-01T00:10:00"):].

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @FEJTWOW ! I've added a few comments.
Regarding the traffic CSV file: it's quite big (100 MB). Could you check whether it could be better to use git large files for this one?

darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
darts/datasets/__init__.py Outdated Show resolved Hide resolved
@FEJTWOW FEJTWOW marked this pull request as ready for review October 24, 2022 14:10
@FEJTWOW FEJTWOW requested a review from hrzn October 26, 2022 11:04
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work @FEJTWOW

@FEJTWOW FEJTWOW merged commit 979491b into master Oct 28, 2022
@madtoinou madtoinou deleted the Improvement/Add_new_datasets_617 branch July 5, 2023 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new datasets to darts.datasets
4 participants