Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnamed vector returned by mnLogLoss breaks model selection in training #514

Closed
jayqi opened this issue Nov 2, 2016 · 4 comments
Closed

Unnamed vector returned by mnLogLoss breaks model selection in training #514

jayqi opened this issue Nov 2, 2016 · 4 comments

Comments

@jayqi
Copy link

@jayqi jayqi commented Nov 2, 2016

The latest CRAN version 6.0-72 has a bug in the summary function mnLogLoss that prevents logLoss from being used as a metric in training resampling.

The previous version (6.0-71) returned a named vector where the computed logloss values are assigned the name logLoss.

The new version in 6.0-72 that uses ModelMetrics::mlogLoss returns an unnamed vector.

In addition to that, train.default is not robust to all of the non-parameter columns in performance being unnamed. Since the column for logLoss is unnamed, it trips up when trying to extract that column from performance and triggers the

cat(paste("Something is wrong; all the", metric, 
    "metric values are missing:\n"))

error.

@jayqi
Copy link
Author

@jayqi jayqi commented Nov 2, 2016

In case it's useful, here's the output using the minimal model code in the issue template:

> library(caret)
> set.seed(1)
> dat <- twoClassSim(100)
> X <- dat[,1:5]
> y <- dat[["Class"]]
> 
> model_class <- train(
+   X, y, 
+   metric='logLoss',
+   trControl=trainControl(
+     method="cv", 
+     number=5,
+     classProbs=TRUE, 
+     summaryFunction=mnLogLoss,
+     savePredictions="final")
+ )
Something is wrong; all the  metric values are missing:
       V1        
 Min.   :0.4827  
 1st Qu.:0.4843  
 Median :0.4860  
 Mean   :0.4890  
 3rd Qu.:0.4922  
 Max.   :0.4984  
Error in train.default(X, y, metric = "logLoss", trControl = trainControl(method = "cv",  : 
  Stopping
In addition: Warning message:
In train.default(X, y, metric = "logLoss", trControl = trainControl(method = "cv",  :
  The metric "logLoss" was not in the result set.  will be used instead.
@JackStat
Copy link
Contributor

@JackStat JackStat commented Nov 3, 2016

I am unable to reproduce the error.

> set.seed(1)
> dat <- twoClassSim(100)
> X <- dat[,1:5]
> y <- dat[["Class"]]
> 
> model_class <- train(
+  X, y,
+  metric='logLoss',
+  trControl=trainControl(
+    method="cv",
+    number=5,
+    classProbs=TRUE,
+    summaryFunction=mnLogLoss,
+    savePredictions="final")
+ )
> model_class
Random Forest 

100 samples
  5 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 80, 80, 81, 80, 79 
Resampling results across tuning parameters:

  mtry  logLoss  
  2     0.4826553
  3     0.4859825
  5     0.4983792

logLoss was used to select the optimal model using  the smallest value.
The final value used for the model was mtry = 2. 

Can you share your sessionInfo()

@JackStat
Copy link
Contributor

@JackStat JackStat commented Nov 3, 2016

That would be because @topepo already fixed the issue in master :) 9d1c56b

@JackStat
Copy link
Contributor

@JackStat JackStat commented Nov 7, 2016

I think we can close this now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.