New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gaps in df #1433
Comments
Hi @hrzn @rmk17 , has this issue been take care of in the latest Darts please? I am also finding it very difficult to read a dataframe that has natural gaps into a TimeSeries object. Imputation on weekends do not make business sense. Is there a way we can tell TimeSeries to ignore the gaps? Pandas is able to look away at the gaps, I am sure Darts can too? Using 'B' as the business days also do not help BTW. Many thanks for looking. Sorry for re-posting. |
I am not the people you have tagged, but I have had some similar work arounds. As @hrzn mentioned in #284, darts doesn't support gaps in observation, however, it is a relatively easy problem to approach. Darts is really strong once you have continuous (or MOSTLY continuous data), but doing these transformations in darts is tricky and I will suggest you do it in pandas. Consider your pd.df with the observations and gaps. I am not sure if you are looking at data from multiple entities so I will explain as if you were. Group your dataframe with by week and entity (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html). Convert the grouped dataframe into a list of dataframes. Now feed each new sub-dataframe into darts. If you need meta-data or static covariates, I will suggest you add them before doing the grouper. If you do not do this, you will lose which entity a sub-dataframe (and thus sub-TimeSeries) belongs to. |
Hi @optionsraghu, have a look at #1420 and #1418, that might help solve this issue. |
Thank you @dennisbader . I will check them out. |
Hi @dennisbader , Unfortunately it seems more complicated than your solution as there seems to be custom holidays in the dataset that I am unaware of and there are thousands of rows. The one workaround I am doing is to sort the df with pandas on time and then remove the date/time column and then reading into series to further model. That way there is no conflict of interest in the frequency /gaps in data. I hope this should not change the modeling behaviour as the Df is anyway sorted... |
I've made a comment on #284 to justify why @optionsraghu you can try to do as you suggest and remove the date column from your DF before loading it as a (integer-indexed) |
@hrzn I am not sure I understand the question on the future dates? A lot of real world data have breaks and missing values. Employee attendance, Car park traffic etc all have weekend breaks. While date is an interesting and important attribute, the causality of various features in an orderly manner is what matters. Interpolation and imputation can skew the model in these cases. |
Maybe what I mean is best explained by an example. Assume you have the following date/value pairs corresponding to a business day frequency, with one missing entry (the second Monday), with some arbitrary values:
You basically have two choices now.
The second Monday has been filled with a NaN, which will cause an issue with all forecasting models currently in Darts. A model handling NaNs would have to be a model specially designed not just for forecasting, but also to handle missing observations.
Of course this way of doing things remove missing values concerns when building the Furthermore, the point I was making about the future date is this. Assume that you forecast the 3 next values of this series. You'll get something like this:
How will you know what date |
@hrzn I agree with you in that date requirements. I think it also depends on the business case on hand. For critial processes yes it it needed. Thanks. |
Originally posted by @optionsraghu in #284 (comment)
The text was updated successfully, but these errors were encountered: