Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'skip' in createTimeSlices() #491

Closed
praftery opened this issue Sep 20, 2016 · 2 comments
Closed

'skip' in createTimeSlices() #491

praftery opened this issue Sep 20, 2016 · 2 comments

Comments

@praftery
Copy link

@praftery praftery commented Sep 20, 2016

Minor request here.

  1. The 'skip' arg for createTimeSlices is not available when passing trainControl(method = "timeslice"...) to train(), which means a user must create the time slices themselves if they need to reduce the sample space. That's a very common need for timeseries analysis. For example, if doing a timeseries analysis with 1 minute data over a one month period, you can easily end up with thousands of train/test samples...

  2. The 'skip' parameter itself is not that useful, as a user has to do some math to figure out just how many samples will be generated when creating the slices. It might be more useful to specify the total numer of test samples (n_samples), and randomly select them without replacement from the possible samples.

i.e. replace line 25 in createTimeSlices.R:
x[seq(1, n, by = skip)]
with
sample(x, n_samples, replace = FALSE)

@topepo
Copy link
Owner

@topepo topepo commented Sep 20, 2016

I can add it to the control function.

  1. The 'skip' parameter itself is not that useful

We're going to disagree on that. It is a little math but it makes sense for week and year data (e.g. skip = 6 when you want the same day across weeks).

> library(lubridate)
> dates <- today() + days(1:17)
> 
> ind <- createTimeSlices(1:length(dates), initialWindow = 3, skip = 6)
> lapply(ind$test, function(x, y) wday(ymd(y[x])), y = dates)
$Testing01
[1] 6

$Testing08
[1] 6

Using sample is not a good option since you probably want to equally spaced gaps.

@praftery
Copy link
Author

@praftery praftery commented Sep 20, 2016

I see your point about random sampling vs. skip. Thanks for building in this feature - it's very useful for timeseries crossvalidation, and I appreciate it.

topepo added a commit that referenced this issue Oct 28, 2016
@topepo topepo closed this Oct 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.