Add empty label drop from factors to caret WIKI #260

tobigithub · 2015-10-08T07:23:03Z

Hi,
there are a number of cases where I (and others) observed that error "Something is wrong; all the Accuracy metric values are missing:".

Could the label drop function: factor(myFactor,exclude=NULL) to remove unused factor labels be added to the caret pre-processing WIKI? I think that would be helpful, the WIKI is clearly excellent and the best starter (besides the book) for getting a quick taste of the caret.

Also there could be an reference to PDF An introduction to data cleaning with R.
Not sure how many of these ideas in the article above could be added to the convenience method preProcess {caret}, but data pre-processing is surely one of the main time eaters when dealing with real-world ML data, because of so many levels of failure that are possible.

Example code:

# Random Forest
# caret method = "rf"

require(caret)
data(iris)

# restrict iris dataset to two classes setose and versicolor
TrainData <- iris[1:100,1:4]
TrainClasses <- iris[1:100,5]

# unfortunately factor levels still exist
summary(TrainClasses)
print(TrainClasses)

rfFit <- train(TrainData, TrainClasses,
                method = "rf",
                trControl = trainControl(method = "boot632"))

##Something is wrong; all the Accuracy metric values are missing:
##    Accuracy       Kappa     AccuracyApparent KappaApparent
## Min.   : NA   Min.   : NA   Min.   : NA      Min.   : NA  
## 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA      1st Qu.: NA  
## Median : NA   Median : NA   Median : NA      Median : NA  
## Mean   :NaN   Mean   :NaN   Mean   :NaN      Mean   :NaN  
## 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA      3rd Qu.: NA  
## Max.   : NA   Max.   : NA   Max.   : NA      Max.   : NA  
## NA's   :3     NA's   :3     NA's   :3        NA's   :3    
##Error in train.default(TrainData, TrainClasses, method = "rf", trControl = trainControl(method = "boot632")) : 
##  Stopping
##In addition: There were 50 or more warnings (use warnings() to see the first 50)

rfFit
confusionMatrix(rfFit)



#----------------------------------

# use TrainData <- iris[1:100,1:4] for two classes 
# then drop unused labels from factor using factor(exclude=NULL)
TrainData <- iris[1:100,1:4]
TrainClasses <- factor(iris[1:100,5],exclude=NULL)

summary(TrainClasses)
print(TrainClasses)

rfFit <- train(TrainData, TrainClasses,
                method = "rf",
                trControl = trainControl(method = "boot632"))
rfFit
confusionMatrix(rfFit)

topepo · 2015-10-08T14:11:11Z

It would be better to put a warning that this is occurring or throwing an error. I don't want to silently change the data objects inside the function.

tobigithub · 2015-10-26T02:20:00Z

Thank you.
Tobias

topepo added a commit that referenced this issue Oct 11, 2015

added a check on the outcome for issue #260

d8d986c

topepo added a commit that referenced this issue Oct 13, 2015

methods for excluding predictors during pre-processing for issue #260

dab0dca

topepo closed this as completed Oct 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add empty label drop from factors to caret WIKI #260

Add empty label drop from factors to caret WIKI #260

tobigithub commented Oct 8, 2015

topepo commented Oct 8, 2015

tobigithub commented Oct 26, 2015

Add empty label drop from factors to caret WIKI #260

Add empty label drop from factors to caret WIKI #260

Comments

tobigithub commented Oct 8, 2015

topepo commented Oct 8, 2015

tobigithub commented Oct 26, 2015