-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing treeshap calculation when data is weighted (fixes #5095) #5097
base: master
Are you sure you want to change the base?
Conversation
Apparently, tests are failing: in the test cases, the contributions don't seem to sum up to the predictions. In my toy example they do: print(np.linalg.norm(model.predict(X, pred_contrib=True).sum(axis=1)-model.predict(X))) # 6.018678656295655e-13 I will investigate why. |
Kindly ping @slundberg for the help. |
I've done some digging. The test I investigated is unweighted, and the error seems to come from the fact that in the case of classification, the weights don't seem to be identical to the number of records. Apparently the left_weight and right_weight arguments don't simply always mean the summed weight of the relevant data subsets. They come from the left_sum_hessian and right_sum_hessian fields of a SplitInfo object. While this appears to be indeed the summed data weight with a regression task, in the case of a binary classification task (which the failing tests use) it's a different value. Example tree from (unweighted) regression, note the
Example tree from (unweighted) classification:
Note that while close, the ratio of Unfortunately, the splitting part of the codebase is a bit esoteric to me and I'm afraid I'd need a significant time investment to understand it. I wonder if someone familiar with that part could clarify the role of hessians vs weights? |
It seems to me there hasn't been any activity regarding this. I'm wondering if I can do anything to help? |
Sorry for the delay and thank you for your help and patience! @guolinke or @shiyu1994 are you able to help review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@guolinke Sorry, I guess you've overlooked the part of this PR about failing tests: #5097 (comment) and #5097 (comment). |
It seems the failed tests are related to shap calculation. @proto-n can you help to fix the tests? |
@guolinke I would love to, but some of the relevant code falls outside of my expertise. I've detailed the reasons for the failing tests in this comment. TL;DR: the fix works for regression, but the weight variables seem to mean something else for classification (and I'm not sure what, and whether that's correct behavior). |
I see, thank you. In classification, the weight is actually hessian, which depends on its gradient, and is not a constant. Actually, what you need is the data sample weight, not the "weight", which is the sum of hessian , in model file. |
I see, thanks for the info! Correct SHAP calculation needs the actual sum data weight that belongs in the given leaf. This means that unfortunately we either have to repurpose these variables to store them, or we need to allocate additional variables in the tree structure for this reason. I couldn't find any reference to these variables being used internally, other than in SHAP calculation. Still, they end up being serialized, printed, publicly accessible from e.g. the python API, etc., so repurposing them would be kind of a breaking change. Though I'm not sure the relevant documentation was very clear from the get-go. What should be the course of action here? |
It is not trivial to store data weights in tree structure, as it will affect the training efficiency. Although hessian is not weight, but it is very close. So I prefer to make the hessian-based SHAP. So we can believe current output is correct, and fix ground truth values in tests. Besides, documents about the hessian based SHAP should be added as well. |
The thing is, this way the contribution values will violate the rule that |
I am not familiar with SHAP, so cannot make the decision. Kindly ping @slundberg for help in data weights. |
Maybe it's worth creating a cross-reference issue in the shap repository to make maintainers there aware of the problem? Seems that there are already a lot of issues about using sample weights with shap, e.g. shap/shap#2508, shap/shap#1328, shap/shap#2395. |
Sorry for the possibly dumb question, but how do I ping @slundberg for help? |
Not a dumb question! |
It has been 3 months since the last activity on this PR. What needs to happen for it to move forward? We need an opinion from someone who is an expert in SHAP? If that's the case, maybe we could tag in our friends from XGBoost for an opinion, or look at how XGBoost handles this? |
Unfortunately I don't think XGBoost can help here, since we know the problem and the possible solutions, it's just that both are inconvenient. The problem is that LightGBM doesn't store the exact per-leaf (and per-node) sum data weights in the model in the case of classification. Instead, it stores the hessians which are close but not the same. Storing the actual weights is not trivial according to @guolinke. However, these are required for correct SHAP calculation. Somebody (I'm not sure who) has to make a decision: either SHAP can be calculated based on hessians (which results in breaking one of the fundamental properties of SHAP), or the actual weights have to be stored somehow (which is a nontrivial modification). |
Fixing Issue #5095
The bug is described in the issue. Continuing with the same example, after this commit, we have:
resulting in:
Which is the correct behavior, now the same as the expanded version (see the related issue for expanded version):
Also, the contributions now equal those of the expanded version:
The commit modifies the codebase in two places. First, it modifies the SHAP calculation to reference a newly created
data_weight
function. Second, I needed to also fix the weight calculation for internal nodes and the expected value calculation as well, as the weights were calculated incorrectly for some reason, and the expected value didn't consider weights either.Funny enough, beyond SHAP calculations, I couldn't find any other reference to either expected value or the internal weights storage (other than serialization and cuda->cpu / cpu->cuda transfer). I may have missed something though.
Also, it's not clear to me if anything else (like tests of some kind) is required to merge the PR. Please advise.