Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add empty label drop from factors to caret WIKI #260

Closed
tobigithub opened this issue Oct 8, 2015 · 2 comments
Closed

Add empty label drop from factors to caret WIKI #260

tobigithub opened this issue Oct 8, 2015 · 2 comments

Comments

@tobigithub
Copy link

Hi,
there are a number of cases where I (and others) observed that error "Something is wrong; all the Accuracy metric values are missing:".

Could the label drop function: factor(myFactor,exclude=NULL) to remove unused factor labels be added to the caret pre-processing WIKI? I think that would be helpful, the WIKI is clearly excellent and the best starter (besides the book) for getting a quick taste of the caret.

Also there could be an reference to PDF An introduction to data cleaning with R.
Not sure how many of these ideas in the article above could be added to the convenience method preProcess {caret}, but data pre-processing is surely one of the main time eaters when dealing with real-world ML data, because of so many levels of failure that are possible.

Example code:

# Random Forest
# caret method = "rf"

require(caret)
data(iris)

# restrict iris dataset to two classes setose and versicolor
TrainData <- iris[1:100,1:4]
TrainClasses <- iris[1:100,5]

# unfortunately factor levels still exist
summary(TrainClasses)
print(TrainClasses)

rfFit <- train(TrainData, TrainClasses,
                method = "rf",
                trControl = trainControl(method = "boot632"))

##Something is wrong; all the Accuracy metric values are missing:
##    Accuracy       Kappa     AccuracyApparent KappaApparent
## Min.   : NA   Min.   : NA   Min.   : NA      Min.   : NA  
## 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA      1st Qu.: NA  
## Median : NA   Median : NA   Median : NA      Median : NA  
## Mean   :NaN   Mean   :NaN   Mean   :NaN      Mean   :NaN  
## 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA      3rd Qu.: NA  
## Max.   : NA   Max.   : NA   Max.   : NA      Max.   : NA  
## NA's   :3     NA's   :3     NA's   :3        NA's   :3    
##Error in train.default(TrainData, TrainClasses, method = "rf", trControl = trainControl(method = "boot632")) : 
##  Stopping
##In addition: There were 50 or more warnings (use warnings() to see the first 50)

rfFit
confusionMatrix(rfFit)



#----------------------------------

# use TrainData <- iris[1:100,1:4] for two classes 
# then drop unused labels from factor using factor(exclude=NULL)
TrainData <- iris[1:100,1:4]
TrainClasses <- factor(iris[1:100,5],exclude=NULL)

summary(TrainClasses)
print(TrainClasses)

rfFit <- train(TrainData, TrainClasses,
                method = "rf",
                trControl = trainControl(method = "boot632"))
rfFit
confusionMatrix(rfFit)
@topepo
Copy link
Owner

topepo commented Oct 8, 2015

It would be better to put a warning that this is occurring or throwing an error. I don't want to silently change the data objects inside the function.

@tobigithub
Copy link
Author

Thank you.
Tobias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants