New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Natural gaps in timeseries observations #284
Comments
Hm, that is an extremely interesting problem! Thanks for mentioning it. A few questions and remarks:
Let us know if any suggestions helped. We hope we can solve this issue somehow. |
At the moment time series with holes are not supported. As @tneuer said we have it on our roadmap to address this. In the meanwhile if you'd like to train a forecasting model on each of the sub-segments you can still create several |
Hey @rmk17, could you solve your problem? I would have two more suggestions:
pd.date_range(start=data.index[0], freq="H", periods=len(data)) Then use darts to fit any model you like and back-transform in the end and introduce the gaps again. It sounds a little bit tedious (and it is) but the transformation needs only to be done once in the beginning and once in the end.
|
Thank you for expressing interest in my issue. At the moment I was able to make a custom index with the dates I need by passing a list of holidays to
So it does create an Index with freq attribute, but since pandas does not allow for gaps, index will end prematurely timewise. Also in case of gaps introduced with data at some later point,
Whereas
Many thanks for all the input you gave. P.S. Sorry, accidentally pressed "Close issue" button. |
@rmk17 Not sure if the problem still persists, but what you can do is to impute the missing values with Regarding the strong seasonality problem: The above suggested method wouldn't introduce any artificial data into the model so that the seasonality is preserved. However, depending on the choice of the model (e.g., RNN) you might want to add some seasonally lagged features (depending on the frequency and seasonality of your data) or add an attention block. As far as the TCN network is concerned, since it has access to the full history, it might pick it up itself. |
@StatMixedML i have implemented such a strategy when building my own models (often through masking the loss function). But can you share how you would do this through the darts API? thank you |
Any updates on the road map. I too have natural gaps in my data and not sure how to proceed. Thanks! |
I would also like to know if there are any updates on built-in ways to work with natural gaps. I'm trying to do a stock prediction price, but the stock market only works on weekdays, so I have a similar problem to the stated on this issue. |
Have you tried using a business day frequency ("B") ? |
Oh, I actually wasn't aware of that frequency type. That might solve my problem, thank you so much! |
@gabrielgcbs . Can we chat? I'm trying to do the same! Probably a 'fool's errand' to many but seems interesting. |
Hi @hrzn @rmk17 , has this issue been take care of in the latest Darts please? I am also finding it very difficult to read a dataframe that has natural gaps into a TimeSeries object. Imputation on weekends do not make business sense. Is there a way we can tell TimeSeries to ignore the gaps? Pandas is able to look away at the gaps, I am sure Darts can too? Using 'B' as the business days also do not help BTW. Many thanks for loking. |
@optionsraghu You can represent gaps or missing values in Darts: just fill missing dates with NaNs. However, that won't take you far because handling gaps / missing dates is actually a modelling problem. How do you want a model to capture the fact that certain values are missing? There are several possible answers, which can depend of what a missing value means, and how to capture them with a model. Currently in Darts all forecasting models assume that values are present (i.e not NaN) for all dates, and so feeding them with series containing NaNs will result in NaNs in the output. In the future, some models might be developed and integrated in Darts that explicitly account and model the phenomenon of missing values on certain dates, and also are able to predict "NaNs" values, but that is not the case today, and that would probably not change our requirements on In pandas, you can build a Dataframe with missing dates, but it won't be time-indexed and will not represent a time series. Finally, note that you can still decide to see each of the series between the gaps as separate |
@hrzn Many thanks for your kind help. As I mentioned I was able to overcome this issue with just converting pandas df time index into simple integers (rangeIndex) and then feeding into darts. That should solve the issue for the timebeing as all I need is to keep the causality order intact when training the model. Again I presume here that having a rangeindex is sufficient enough for Darts to slice appropriate training data and label data in the learning process. In that context is there a way I can look into the individual train and label data points just to confirm my assumption? Not a big deal if not. You guys have done an excellent job by abstracting so many nuances upstream. Being in the intersection of stats,domain and computer coding (jack of many trades in other words!) I find your package extremely useful. The documentation can be a little bit more explanatory but then I understand it is a work in progress. Kudos to your team. |
Thanks for the kind words :) |
I have hourly energy observations taken during business days only. So 120 hrs per week, not 168. With Sat and Sun missing always and holidays as well. Seasonality is daily, weekly, yearly.
Was trying to follow samples and use TimeSeries.from_dataframe with default settings. I got a lot of NaNs inserted into DateTimeIndex, that matches
pandas.asfreq('H')
behaviour. So with train/test splittrain, val = series.split_before(pd.Timestamp('20200101'))
I receiveSo one can see that data has exploded with NaNs.
Obviously
darts.utils.statistics.plot_acf()
and adarts.utils.statistics.check_seasonality()
do not work with NaNs.plot_acf()
gets me a straight line at zero, where should be AR lags up until 192.check_seasonality()
reports[2021-03-05 17:39:16,120] INFO | darts.utils.statistics | The ACF has no local maximum for m < max_lag = 24.
If I supply
fill_missing_dates=False
for the TimeSeries.from_dataframe():series = TimeSeries.from_dataframe(data, time_col='date', value_cols=['load'], fill_missing_dates=False)
I get
Could not infer frequency. Are some dates missing? Try specifying 'fill_missing_dates=True'
With
freq='H'
parameter to function above no luck also. The same loophole as with pandas DateTimeIndex with freq='H'In statsmodels Sarimax models I was able to overcome datetime freq warning by converting index to PeriodIndex which supports gaps.
data.index = pd.DatetimeIndex(data.index).to_period('H')
Please advise what are my options with Darts to be able to work with time series data with natural gaps?
Thanks a lot in advance for the attention.
P.S. Some pictures to illustrate time series and gaps
The text was updated successfully, but these errors were encountered: