Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation] TimeSeries parameter: frequency- what is it? #128

Closed
woj-i opened this issue Jul 7, 2020 · 8 comments
Closed

[Documentation] TimeSeries parameter: frequency- what is it? #128

woj-i opened this issue Jul 7, 2020 · 8 comments
Labels
question Further information is requested

Comments

@woj-i
Copy link

woj-i commented Jul 7, 2020

I tried to create a TimeSeries object with either constructor or from_dataframe method, but I get:

ValueError: Could not infer explicit frequency. Observed frequencies: {'D', 'B'}. Is Series too short (n=2)?

Could you please describe in more details the frequency parameter:

  1. What are possible values?
  2. How this parameter is used?
  3. Example would be nice.
  4. Is it possible to pass data with irregular time values?
  5. Can I pass [1:N] for sequence of data, that are not bind to date? If so- what frequency should I use then?
@woj-i woj-i added bug Something isn't working triage Issue waiting for triaging labels Jul 7, 2020
@pennfranc pennfranc added question Further information is requested and removed bug Something isn't working triage Issue waiting for triaging labels Jul 8, 2020
@pennfranc
Copy link
Contributor

Hi woj-i, thanks for letting us know about the issue!

To answer your questions about the freq parameter:

  1. Any offset alias can be used, as can be found here: https://stackoverflow.com/a/35339226
  2. This parameter is used when we are dealing with very short (n < 3) TimeSeries instances where the frequency cannot be inferred.

This will raise a ValueError:

times = pd.date_range('20130101', '20130102')
series = TimeSeries.from_times_and_values(times, range(2))

This will work:

times = pd.date_range('20130101', '20130102')
series = TimeSeries.from_times_and_values(times, range(2), freq='D')
  1. Like explained above, sometimes the frequency can be detected and missing entries are filled with NaNs. But as you can see this functionality is not bullet proof.
  2. TimeSeries instances always require a pandas.DatetimeIndex, there is no way around that at this time.

That being said, as far as I can tell the frequency parameter is not really the problem here. Instead, it probably has to do with your input data. More specifically, it is most likely caused because your time index does not have a consistent frequency, meaning that the time difference between two subsequent indices is not constant.
If this is the case, the TimeSeries constructor tries to detect the frequency from subsequences of the time series, and, if only one such frequency is detected, it will fill the missing dates with NaN values such that we have a consistent frequency. However, if more than one frequency is detected, such as in your case (a calendar day and a business day frequency were detected), an error will be thrown.
Unfortunately it seems like in its current form Darts can't work with your (unmodified) data set. The only option I can see right now is to manually make sure that your index has a constant frequency. But this is definitely something we want to improve, we added this to our backlog. Thanks for your feedback! Also feel free to give this a shot yourself if you think you have a good solution!

I hope this helps. If not, please don't hesitate to reach out again!

@woj-i
Copy link
Author

woj-i commented Jul 8, 2020

Thank you for explaining that!
My problem was values in the time-index. I have data from weekdays and no data from weekends.

What I would suggest is to put an information, that filling missing dates with NaN is required for the input. As I understood from the doc you may fill it, but it did not seem to be required.

Moreover, I've seen freq parameter is ignored if size of the frame > 2. The doc says, that it must be passed for len(df) < 3, but it does not say it is ignored for the other cases.

You could also add this reference https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases as possible values of "frequency" parameter.

@pennfranc
Copy link
Contributor

I think you're right about the documentation, it could definitely be made clearer. Also, the fact that a time series with business day frequency cannot be processed should be addressed as well I believe. Thanks for all of your inputs!

@eduardoansi
Copy link

I am facing the same issue with the weekends. I only have data for business days so the frequency is not consistent. Also I can't fill the weekends with null or 0 values because it would impact the model.

In this particular case seasonality is not so important for me, so what I am doing is getting all the values of my original data without the dates, and then just joining it into a dataframe with a regular interval. I lose the precise information about when each value happened, but I can at least see how past values impact future ones.

@jerryan999
Copy link

When processing the stock data, I met the same problem.

It's such a common problem

@pennfranc
Copy link
Contributor

Update: Time series data with a business day index should now be supported, even when incomplete. In this PR we added the option to override the automatic frequency detection in the case of inconsistent frequency by setting the freq argument of the TimeSeries constructor. To be more specific, whereas this code snippet will result in the same error as before

df = pd.read_csv('AAPL.csv', delimiter=",")
series = TimeSeries.from_dataframe(df, 'Date', ['Close'])
series.plot()

passing freq='B' to the constructor will solve this problem. This code should execute correctly:

df = pd.read_csv('AAPL.csv', delimiter=",")
series = TimeSeries.from_dataframe(df, 'Date', ['Close'], 'B')
series.plot()

(source of data set used for test: https://www.kaggle.com/jacksoncrow/stock-market-dataset?)

This patch has already been published to pip, so you can get the updated version of Darts like this:

pip install u8darts

Please let us know if this solved your issue!

@eduardoansi
Copy link

Thanks for being quick to solve it!

I tried to update the package but I couldn't. Says that everything was already satisfied when I try to install it again, with or without --upgrade. Does it take some time to be available?

@TheMP
Copy link
Contributor

TheMP commented Jul 14, 2020

Hi, it might take some time for the pypi to notice the changes, but meanwhile you should be able to install new version of darts by naming the specific version:

pip install u8darts==0.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants