Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeSlice feature request #346

Closed
jbheman opened this issue Jan 5, 2016 · 8 comments
Closed

TimeSlice feature request #346

jbheman opened this issue Jan 5, 2016 · 8 comments
Labels

Comments

@jbheman
Copy link

@jbheman jbheman commented Jan 5, 2016

hi,

Regarding timeslice cross-validation : I think it would be useful to have an option NOT to fit a 'final' model on the full dataset. The whole point of timeslice crossvalidation in many contexts is to reflect the temporal context with non-stationarity. We fit a model only using X days of lookback and test on the next Y days - partly to determine if that X-Y paradigm is effective. In this case I'd rather not waste cycles training the model on the full data set because at the end i'm only interested in the most recent fit and the hyperparameters.

best, Jason

@zachmayer
Copy link
Collaborator

@zachmayer zachmayer commented Jan 5, 2016

This would also be useful for non-time slice models too. If all of the models in the tuneGrid are bad, it might not be worth the time to train the final model.

@topepo topepo added the enhancement label Jan 12, 2016
@topepo
Copy link
Owner

@topepo topepo commented Jan 12, 2016

Good point. I'll think of a good way to do that. trainControl is getting pretty "full" but that might be the best place for it.

@jbheman
Copy link
Author

@jbheman jbheman commented Jan 15, 2016

great. fyi i made the following modification to train.default :

<<

## Make the final model based on the tuning results
## ######################################################
if (!trControl$fitFinalModel) {
    finalTime <- NULL
    finalModel <- list()
    pp <- NULL
} else {
    if (!(length(trControl$seeds) == 1 && is.na(trControl$seeds))) set.seed(trControl$seeds[[length(trControl$seeds)]][1])

then i also modify the output of trainControl as follows :

Control <- trainControl(....)
Control$fitFinalModel <- FALSE

thanks for all your great and valuable work on caret!!

Jason

On Jan 12, 2016, at 6:38 PM, topepo notifications@github.com wrote:

Good point. I'll think of a good way to do that. trainControl is getting pretty "full" but that might be the best place for it.


Reply to this email directly or view it on GitHub #346 (comment).

@topepo
Copy link
Owner

@topepo topepo commented Jan 15, 2016

To solve some unknown future issues, I think that I will make an option
called indexFinal to trainControl that will index the samples used to
fit the final model. By default, this will encompass all the data.

On Fri, Jan 15, 2016 at 12:15 PM, jbheman notifications@github.com wrote:

great. fyi i made the following modification to train.default :

<<

Make the final model based on the tuning results

if (!trControl$fitFinalModel) {
finalTime <- NULL
finalModel <- list()
pp <- NULL
} else {

if (!(length(trControl$seeds) == 1 && is.na(trControl$seeds)))
set.seed(trControl$seeds[[length(trControl$seeds)]][1])

then i also modify the output of trainControl as follows :

Control <- trainControl(....)
Control$fitFinalModel <- FALSE

thanks for all your great and valuable work on caret!!

Jason

On Jan 12, 2016, at 6:38 PM, topepo notifications@github.com wrote:

Good point. I'll think of a good way to do that. trainControl is getting
pretty "full" but that might be the best place for it.


Reply to this email directly or view it on GitHub <
https://github.com/topepo/caret/issues/346#issuecomment-171100378>.


Reply to this email directly or view it on GitHub
#346 (comment).

topepo added a commit that referenced this issue Apr 5, 2016
@topepo
Copy link
Owner

@topepo topepo commented Apr 5, 2016

I just checked in a change for this if you would like to give it a try.

@jbheman
Copy link
Author

@jbheman jbheman commented Apr 5, 2016

Will do thanks!

On Apr 4, 2016, at 8:27 PM, topepo notifications@github.com wrote:

I just checked in a change for this if you would like to give it a try.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@topepo
Copy link
Owner

@topepo topepo commented Apr 18, 2016

I'm going to assume that these changes are good for your problem.

@topepo topepo closed this Apr 18, 2016
@slfan2013
Copy link

@slfan2013 slfan2013 commented Mar 19, 2017

Hi @topepo , I am not sure if indexFinal is really working.

  1. My log says the Fitting xxx on full training set even when I set my indexFinal to a subset of training set.

  2. In addition, the indexFinal also influence the pre-processing procedure. I thought the pre-processing should influence the data but not only the data[indexFinal,].

Please see following code

### preparing
{
pacman::p_load(R.utils, caret, doParallel)
cl <- makeCluster(8, outfile="")
registerDoParallel(cl)
set.seed(2)
dat <- twoClassSim(48842, noiseVars = 10)# 25 variables.
dat = cbind(dat,data.frame(categorical = sample(LETTERS,nrow(dat),replace = T)))
dat = cbind(dat,data.frame(categorical2 = sample(LETTERS[1:2],nrow(dat),replace = T,prob=c(0.001,0.999))))
fit_on <-  list(rs1 = 1:1300)
pred_on <- list(rs1 = 1330:10000)
}


fitControl <- trainControl(method = "cv",number=1,repeats=1,
                           classProbs = TRUE,summaryFunction = twoClassSummary,
                           index = fit_on, indexOut = pred_on,
                           indexFinal = 1:2, ######### if I change this number, the preprocessing changes, which I think is strange.
                           verboseIter = TRUE,
                           savePredictions = T,
                           returnData = FALSE
                           # ,returnResamp = "none"
)

model = train(Class ~ ., data = dat,  
              method = "glm",
              preProcess=c("center","scale", "zv"),
              trControl = fitControl,
              tuneLength = 2)

model

The above code gives result

Aggregating results
Fitting final model on full training set
...
Pre-processing: centered (27), scaled (27), remove (24) 
...

It says it fit final model on full training set. I thought caret should return an error when using only two samples!

In addition, changing the indexFinal from 1:2 to 1:200, the preprocessing result differs.

fitControl <- trainControl(method = "cv",number=1,repeats=1,
                           classProbs = TRUE,summaryFunction = twoClassSummary,
                           index = fit_on, indexOut = pred_on,
                           indexFinal = 1:200,
                           verboseIter = TRUE,
                           savePredictions = T,
                           returnData = FALSE
                           # ,returnResamp = "none"
)

model = train(Class ~ ., data = dat,  
              method = "glm",
              preProcess=c("center","scale", "zv"),
              trControl = fitControl,
              tuneLength = 2)

model

Pre-processing: centered (50), scaled (50), remove (1) 

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.