Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upConfusion matrix normalization #355
Conversation
The special cases I was considering resulted in a normalization that is equal to "average", which is probably not the purpose of "none".
|
That sounds fine to me and it would be good to make those changes more universal (i.e. for |
|
I see that |
|
good, this is very important , everybody use it On Tue, Jan 19, 2016 at 4:31 PM, Alexis Sarda notifications@github.com
|
|
I took the liberty of adding the boot632 reference, mainly because I initially thought the 0.632+ was being used, so I had to look at the code to make sure which one was actually computed. |
|
Thanks! |
I have been using the
caretpackage for my thesis, which is why I'm messing with the code and reporting the issues I find, I hope that's ok. This time I was looking at theconfusionMatrixfunction fortrainobjects. I did a few tests before I changed anything.Let's take a simple example, there are 150 observations in the
irisdataset, so using 5-fold CV would result in training sets with 30 observations eachNow let's calculate the available confusion matrices
I understand
noneandoverall, although I don't agree entirely with the description in the documentation. To me,noneactually shows the average counts for each cell across resamples (what supposedlyaveragedoes):And the
overallone is just the same table but in percentages:But I simply don't understand what
averagedoes.This is why I'm proposing these changes...
With the changes
nonecalculates aggregated un-normalized countsaveragecalculates average counts across resamplesoverallcalculates percentual average counts (stays the same)Let me know your thoughts.
I don't know if this makes sense for
rfeandsbf, but if it does, I could adapt the code so that those functions include the changes.