Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up'skip' in createTimeSlices() #491
Comments
|
I can add it to the control function.
We're going to disagree on that. It is a little math but it makes sense for week and year data (e.g. > library(lubridate)
> dates <- today() + days(1:17)
>
> ind <- createTimeSlices(1:length(dates), initialWindow = 3, skip = 6)
> lapply(ind$test, function(x, y) wday(ymd(y[x])), y = dates)
$Testing01
[1] 6
$Testing08
[1] 6Using |
|
I see your point about random sampling vs. skip. Thanks for building in this feature - it's very useful for timeseries crossvalidation, and I appreciate it. |
Minor request here.
The 'skip' arg for createTimeSlices is not available when passing trainControl(method = "timeslice"...) to train(), which means a user must create the time slices themselves if they need to reduce the sample space. That's a very common need for timeseries analysis. For example, if doing a timeseries analysis with 1 minute data over a one month period, you can easily end up with thousands of train/test samples...
The 'skip' parameter itself is not that useful, as a user has to do some math to figure out just how many samples will be generated when creating the slices. It might be more useful to specify the total numer of test samples (n_samples), and randomly select them without replacement from the possible samples.
i.e. replace line 25 in createTimeSlices.R:
x[seq(1, n, by = skip)]with
sample(x, n_samples, replace = FALSE)