-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing bug in positive class selection used by makePredObsMatrix #190
Conversation
Overall, this seems like a good idea. A few comments:
Other than that, this seems like a good idea. That todo has been on the list for a while... |
Alright sorry for the build errors, I didn't realize lintr wasn't running on my machine when running unit tests (had to install it first, whoops). Anyways the changes to caretList and caretEnsemble were pretty minor and the only previous test it conflicted with was in test-ensemble.R where an expected probability prediction is now for a different class (i.e. the new value is equal to approximately 1 - the old value). Let me know if you see any issues with those changes or with the new unit test (I'd be happy to move it into an existing test if you don't think it makes sense on its own). |
@zachmayer What you would think of making the above configurable with a global option like: Now that it's clear what all needs to be changed to make the default one or the other, it would be easy to add that switch in. But I don't want to do that unless you're ok with the changes so far and not aware of a better way to make global configuration settings. |
@eric-czech sure, making it a global option works for me. Make the default be the same one as the caret package uses by default |
data(Y.class) | ||
|
||
############################################################################# | ||
context("Do classifier predictions use the correct outcome classes?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would run this test twice, once where the postive level is "yes" and once where the positive level is "no", and check that no re-arrangement happens, regardless of the alphabetical sorting of the target.
@eric-czech Actually, I'm happy with the PR as-is. No need to add a global option, but I'd like you to consider adding 1 more test. |
@zachmayer No problem, I added in the configuration option as well as some extra tests. I didn't see any reason to test problems with alphabetical ordering at first, but I'm glad you suggested it because I did ultimately find that while the alphabetical ordering of the level names doesn't affect results of caret or caretEnsemble models directly, it does lead to different results indirectly because of the stratified CV splits caret creates (they ARE dependent on the factor names ... which is annoying). In other words if you let the models use the |
#' but that value can be overriden using global options (e.g. | ||
#' \code{options(caret.ensemble.target.bin.level=2)}). | ||
getBinaryLevel <- function() { | ||
value <- as.numeric(getOption("caret.ensemble.target.bin.level", default = 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do default = 1L
.
I like |
Once we have this merged in here, it might actually make sense to add this option and logic to the caret package itself. I really like the approach. |
# order of the factor levels in the response (which is what this test module | ||
# needs to prove invariance to). This happens because createFolds uses | ||
# the base table function to create class frequency counts and that table | ||
# command sorts results alphabetically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should file a bug report with example code on the caret github repo
A couple more comments, but this is looking really good. Do you know to squash git commits? If so, you should squash this PR into one commit. If not, don't worry about it =D. |
Ok @zachmayer I added in a getter and setter for the target level and added coercion to an integer for the argument (as well as a few other things like docs and using the getter in the test cases to reset the target level). The stratified sampling issue is up there too so we'll see if anyone has suggestions on it. I also gave rebasing a try but kind of shot myself in the foot by changing a lot of the same things throughout those commits (err I also kind of screwed it up by repeating some commits .. my gitfu is weak). Anyways, I do have another feature enhancement I was going to submit (for using custom models) and assuming that turns into multiple commits as well I'll be careful rebase them more frequently. My bad. |
No worries! This PR is fine. In the future, you can "squash" multiple commits into a single commit using an "interactive rebase": |
#' and that the resulting integer is in \{1, 2\}. | ||
#' @param arg argument to potentially be used as new target level | ||
#' @return Binary target level (as integer equal to 1 or 2) | ||
validateBinaryTargetLevel <- function(arg){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives weird results if you pass a vector of length > 0, e.g. validateBinaryTargetLevel(1:10)
or validateBinaryTargetLevel(letters)
. You should probably add something like stopifnot(length(arg)==0)
. I'll fix this after merging, since it's very minor.
Fixing bug in positive class selection used by makePredObsMatrix
Also note that after rebasing, you ALWAYS have to |
Cool I think that was where I went wrong .. I couldn't figure out how to recover from not having forced it and doing another rebase was resulting in tons of conflicts. |
@eric-czech I fixed the rebase on my branch, and then force pushed to my own master. Unfortunately, this kinda messed up your fork of my repo. You'll have to replace your master branch with mine: git remote add upstream git@github.com:zachmayer/caretEnsemble.git
git fetch upstream
git branch backup
git checkout upstream/master -B master
git push --force |
Thanks! Looks much better now and that reset on my fork seems to have worked well. |
👍 |
This commit contains a fix for an error in makePredObsMatrix (line 241) where the following code was being used to select the positive class for classification problems:
positive <- as.character(unique(modelLibrary$obs)[2]) #IMPROVE THIS!
The problem with that is that unique returns the values in order of appearance so a different positive class is potentially being selected for each resampling (which seems like a rather large bug if I'm not mistaken). For example:
I made a change to choose that class like this instead:
positive <- levels(modelLibrary$obs)[2]
From a broader perspective, it would be nice if caretEnsemble maintained consistency with caret in that the first level in binary outcomes is treated as the positive class by default. I see hard-coded selections for the second class's probability predictions in predict.{caretList, caretStack} which might not be too hard to override with a more explicit selection of the positive class (or to change to the first class), but I'd be curious to hear what your thoughts are on that. Perhaps it could be some sort of global configuration option rather than a parameter that needs to be passed in multiple places? I dunno. I'm happy to help with any of it if you have suggestions.