Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid seeds for boot632 train control method #349

Closed
asardaes opened this issue Jan 10, 2016 · 2 comments
Closed

Invalid seeds for boot632 train control method #349

asardaes opened this issue Jan 10, 2016 · 2 comments
Labels
bug

Comments

@asardaes
Copy link
Contributor

@asardaes asardaes commented Jan 10, 2016

Simple example:

data(iris)
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]

knnFit1 <- train(TrainData, TrainClasses,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 tuneLength = 10,
                 trControl = trainControl(method = "boot632"))

Results in:

Error in { : task 2 failed - "supplied seed is not a valid integer"

Cause:

Within the nominalTrainWorkflow function the nested foreach loops have arguments iter = seq(along = resampleIndex) and parm = 1:nrow(info$loop). However, when ctrl$method is "boot632" the resampleIndex variable changes and gets the AllData argument appended at the beginning. This causes the foreach loops to do one extra iteration.

The last element of the seeds list is a single numeric value, but if length(parm) > 1 then the following code:

if (!(length(ctrl$seeds) == 1 && is.na(ctrl$seeds))) 
            set.seed(ctrl$seeds[[iter]][parm])

... attempts to access a non-existing value, therefore doing set.seed(NA), which gives the error mentioned above.


Solution:

I'm not sure about this but I think that the seeds list must also get M values in the last element when ctrl$method == "boot632", where M is the number of models as specified in the documentation.

@topepo
Copy link
Owner

@topepo topepo commented Jan 12, 2016

You hit the nail on the head. It does an extra model fit on the entire data set for that method. I'll fix that and update this thread.

Thanks,

Max

@topepo topepo added the bug label Jan 12, 2016
topepo added a commit that referenced this issue Jan 13, 2016
@asardaes
Copy link
Contributor Author

@asardaes asardaes commented Jan 13, 2016

I checked this and it seems to work if the seeds parameter is not provided. Since I was working around the problem, I noticed now this doesn't work:

data(iris)
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]

num_resamples <- 5
num_models <- 5

# specify seeds
set.seed(329)
seeds <- sample(1000, num_resamples*num_models + 1)
seeds <- lapply(seq(from = 1, by = num_models, to = length(seeds)), 
                function(i_seed) seeds[i_seed:(i_seed+num_models)])
seeds[[length(seeds)]] <- seeds[[length(seeds)]][1]

Working

# this one expects a list of length 'num_resamples + 1' (call it L)
knnFit1 <- train(TrainData, TrainClasses,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 tuneLength = num_models,
                 trControl = trainControl(method = "boot", 
                                          number = num_resamples, 
                                          seeds = seeds))

Not working

# this one fails, it expects a list of length L+1
knnFit2 <- train(TrainData, TrainClasses,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 tuneLength = num_models,
                 trControl = trainControl(method = "boot632", 
                                          number = num_resamples, 
                                          seeds = seeds))

If the second case should indeed receive a longer seeds list, then I guess the documentation should reflect it.

topepo added a commit that referenced this issue Jan 15, 2016
Better seeds (related to #349)
@asardaes asardaes closed this Jan 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.