Fixing bug in positive class selection used by makePredObsMatrix #190

eric-czech · 2016-02-14T12:23:16Z

This commit contains a fix for an error in makePredObsMatrix (line 241) where the following code was being used to select the positive class for classification problems:

positive <- as.character(unique(modelLibrary$obs)[2]) #IMPROVE THIS!

The problem with that is that unique returns the values in order of appearance so a different positive class is potentially being selected for each resampling (which seems like a rather large bug if I'm not mistaken). For example:

unique(factor(c('positive', 'negative')))[2] # = 'negative'
unique(factor(c('negative', 'positive')))[2] # = 'positive'
# Given the above, the choice of positive class would be dependent on the order of the response data itself

I made a change to choose that class like this instead: positive <- levels(modelLibrary$obs)[2]

From a broader perspective, it would be nice if caretEnsemble maintained consistency with caret in that the first level in binary outcomes is treated as the positive class by default. I see hard-coded selections for the second class's probability predictions in predict.{caretList, caretStack} which might not be too hard to override with a more explicit selection of the positive class (or to change to the first class), but I'd be curious to hear what your thoughts are on that. Perhaps it could be some sort of global configuration option rather than a parameter that needs to be passed in multiple places? I dunno. I'm happy to help with any of it if you have suggestions.

zachmayer · 2016-02-14T16:48:49Z

From a broader perspective, it would be nice if caretEnsemble maintained consistency with caret in that the first level in binary outcomes

Overall, this seems like a good idea. A few comments:

Why don't you change predict.caretList and predict.caretStack to use the first level as the positive level?
Please add some unit tests to ensure the makePredObsMatrix and the predict functions are using the correct positive levels.
Please fix the lint that the lintr bot found.

Other than that, this seems like a good idea. That todo has been on the list for a while...

…her than second

eric-czech · 2016-02-14T20:26:00Z

Alright sorry for the build errors, I didn't realize lintr wasn't running on my machine when running unit tests (had to install it first, whoops).

Anyways the changes to caretList and caretEnsemble were pretty minor and the only previous test it conflicted with was in test-ensemble.R where an expected probability prediction is now for a different class (i.e. the new value is equal to approximately 1 - the old value).

Let me know if you see any issues with those changes or with the new unit test (I'd be happy to move it into an existing test if you don't think it makes sense on its own).

eric-czech · 2016-02-15T21:33:57Z

@zachmayer What you would think of making the above configurable with a global option like: options(caret.ensemble.def.bin.level=1) # or 2

Now that it's clear what all needs to be changed to make the default one or the other, it would be easy to add that switch in. But I don't want to do that unless you're ok with the changes so far and not aware of a better way to make global configuration settings.

zachmayer · 2016-02-15T21:52:18Z

@eric-czech sure, making it a global option works for me. Make the default be the same one as the caret package uses by default

zachmayer · 2016-02-15T21:58:45Z

tests/testthat/test-classSelection.R

+data(Y.class)
+
+#############################################################################
+context("Do classifier predictions use the correct outcome classes?")


I would run this test twice, once where the postive level is "yes" and once where the positive level is "no", and check that no re-arrangement happens, regardless of the alphabetical sorting of the target.

zachmayer · 2016-02-15T21:59:13Z

@eric-czech Actually, I'm happy with the PR as-is. No need to add a global option, but I'd like you to consider adding 1 more test.

eric-czech · 2016-02-16T14:05:45Z

@zachmayer No problem, I added in the configuration option as well as some extra tests. I didn't see any reason to test problems with alphabetical ordering at first, but I'm glad you suggested it because I did ultimately find that while the alphabetical ordering of the level names doesn't affect results of caret or caretEnsemble models directly, it does lead to different results indirectly because of the stratified CV splits caret creates (they ARE dependent on the factor names ... which is annoying).

In other words if you let the models use the caret::createFolds function then they produce different results depending on what names you give the factors because the stratification is based on results of the base::table function, which orders the factor names alphabetically. Anyways, the results were different because of that initially (not by very much) but by creating the CV splits manually I was able to make sure the factor names don't affect results in the tests.

zachmayer · 2016-02-16T14:50:13Z

R/helper_functions.R

+#' but that value can be overriden using global options (e.g.
+#' \code{options(caret.ensemble.target.bin.level=2)}).
+getBinaryLevel <- function() {
+  value <- as.numeric(getOption("caret.ensemble.target.bin.level", default = 1))


I'd do default = 1L.

zachmayer · 2016-02-16T14:53:02Z

I like getBinaryLevel, thanks for writing that. What do you think about an exported function called setBinaryLevel that's a friendly way to set the option in caretEnsemble? This function could coerce the input to an integer (with a warning if it's not already an integer), and then check that it's equal to 1 or 2.

zachmayer · 2016-02-16T14:53:30Z

Once we have this merged in here, it might actually make sense to add this option and logic to the caret package itself. I really like the approach.

zachmayer · 2016-02-16T15:35:40Z

tests/testthat/test-classSelection.R

+  # order of the factor levels in the response (which is what this test module
+  # needs to prove invariance to).  This happens because createFolds uses
+  # the base table function to create class frequency counts and that table
+  # command sorts results alphabetically.


You should file a bug report with example code on the caret github repo

zachmayer · 2016-02-16T15:44:54Z

A couple more comments, but this is looking really good.

Do you know to squash git commits? If so, you should squash this PR into one commit. If not, don't worry about it =D.

…her than second

eric-czech · 2016-02-16T18:11:46Z

Ok @zachmayer I added in a getter and setter for the target level and added coercion to an integer for the argument (as well as a few other things like docs and using the getter in the test cases to reset the target level).

The stratified sampling issue is up there too so we'll see if anyone has suggestions on it.

I also gave rebasing a try but kind of shot myself in the foot by changing a lot of the same things throughout those commits (err I also kind of screwed it up by repeating some commits .. my gitfu is weak). Anyways, I do have another feature enhancement I was going to submit (for using custom models) and assuming that turns into multiple commits as well I'll be careful rebase them more frequently. My bad.

zachmayer · 2016-02-16T20:44:34Z

No worries! This PR is fine. In the future, you can "squash" multiple commits into a single commit using an "interactive rebase":
http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html

zachmayer · 2016-02-16T20:50:23Z

R/helper_functions.R

+#' and that the resulting integer is in \{1, 2\}.
+#' @param arg argument to potentially be used as new target level
+#' @return Binary target level (as integer equal to 1 or 2)
+validateBinaryTargetLevel <- function(arg){


This gives weird results if you pass a vector of length > 0, e.g. validateBinaryTargetLevel(1:10) or validateBinaryTargetLevel(letters). You should probably add something like stopifnot(length(arg)==0). I'll fix this after merging, since it's very minor.

Fixing bug in positive class selection used by makePredObsMatrix

zachmayer · 2016-02-16T20:55:03Z

Also note that after rebasing, you ALWAYS have to git push -f to replace your branch on github with your new one.

eric-czech · 2016-02-16T21:01:23Z

Cool I think that was where I went wrong .. I couldn't figure out how to recover from not having forced it and doing another rebase was resulting in tons of conflicts.

zachmayer · 2016-02-16T21:01:52Z

@eric-czech I fixed the rebase on my branch, and then force pushed to my own master.

Unfortunately, this kinda messed up your fork of my repo. You'll have to replace your master branch with mine:
http://scribu.net/blog/resetting-your-github-fork.html

git remote add upstream git@github.com:zachmayer/caretEnsemble.git
git fetch upstream
git branch backup
git checkout upstream/master -B master
git push --force

eric-czech · 2016-02-16T21:17:44Z

Thanks! Looks much better now and that reset on my fork seems to have worked well.

zachmayer · 2016-02-16T21:44:58Z

👍

Fixing bug in positive class selection used by makePredObsMatrix

aedfed8

eric-czech mentioned this pull request Feb 14, 2016

Class predictions from ensemble classifiers are reversed #189

Closed

Merge branch 'master' of https://github.com/zachmayer/caretEnsemble

f62616f

eric-czech added 2 commits February 14, 2016 14:42

Refactoring probability predictions to use first level in outcome rat…

a3ad2bb

…her than second

lint fixes

138c39f

zachmayer reviewed Feb 15, 2016
View reviewed changes

eric-czech added 2 commits February 16, 2016 08:00

Adding configuration option for classifier target class and more tests

7c6e6df

Adding line to revert default target class in class selection test

ad32a8f

zachmayer reviewed Feb 16, 2016
View reviewed changes

eric-czech added 7 commits February 16, 2016 11:00

Adding mutators for binary target level

0801d92

Refactoring probability predictions to use first level in outcome rat…

a764298

…her than second

lint fixes

c9a547f

Adding configuration option for classifier target class and more tests

6ce4b2d

Adding line to revert default target class in class selection test

32f18b7

Adding mutators for binary target level

752f52e

Merge branch 'master' of github.com:eric-czech/caretEnsemble

395655b

zachmayer reviewed Feb 16, 2016
View reviewed changes

zachmayer added a commit that referenced this pull request Feb 16, 2016

Merge pull request #190 from eric-czech/master

66956d8

Fixing bug in positive class selection used by makePredObsMatrix

zachmayer merged commit 66956d8 into zachmayer:master Feb 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing bug in positive class selection used by makePredObsMatrix #190

Fixing bug in positive class selection used by makePredObsMatrix #190

eric-czech commented Feb 14, 2016

zachmayer commented Feb 14, 2016

eric-czech commented Feb 14, 2016

eric-czech commented Feb 15, 2016

zachmayer commented Feb 15, 2016

zachmayer Feb 15, 2016

zachmayer commented Feb 15, 2016

eric-czech commented Feb 16, 2016

zachmayer Feb 16, 2016

zachmayer commented Feb 16, 2016

zachmayer commented Feb 16, 2016

zachmayer Feb 16, 2016

zachmayer commented Feb 16, 2016

eric-czech commented Feb 16, 2016

zachmayer commented Feb 16, 2016

zachmayer Feb 16, 2016

zachmayer commented Feb 16, 2016

eric-czech commented Feb 16, 2016

zachmayer commented Feb 16, 2016

eric-czech commented Feb 16, 2016

zachmayer commented Feb 16, 2016

Fixing bug in positive class selection used by makePredObsMatrix #190

Fixing bug in positive class selection used by makePredObsMatrix #190

Conversation

eric-czech commented Feb 14, 2016

zachmayer commented Feb 14, 2016

eric-czech commented Feb 14, 2016

eric-czech commented Feb 15, 2016

zachmayer commented Feb 15, 2016

zachmayer Feb 15, 2016

Choose a reason for hiding this comment

zachmayer commented Feb 15, 2016

eric-czech commented Feb 16, 2016

zachmayer Feb 16, 2016

Choose a reason for hiding this comment

zachmayer commented Feb 16, 2016

zachmayer commented Feb 16, 2016

zachmayer Feb 16, 2016

Choose a reason for hiding this comment

zachmayer commented Feb 16, 2016

eric-czech commented Feb 16, 2016

zachmayer commented Feb 16, 2016

zachmayer Feb 16, 2016

Choose a reason for hiding this comment

zachmayer commented Feb 16, 2016

eric-czech commented Feb 16, 2016

zachmayer commented Feb 16, 2016

eric-czech commented Feb 16, 2016

zachmayer commented Feb 16, 2016