Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict does not produce the probability for glmboost #560

Closed
randel opened this issue Dec 27, 2016 · 3 comments
Closed

predict does not produce the probability for glmboost #560

randel opened this issue Dec 27, 2016 · 3 comments

Comments

@randel
Copy link

@randel randel commented Dec 27, 2016

This may not be an issue of caret, but I just want to bring it to your attention. In short, predict(train_object, type='prob') actually does not produce the probability for glmboost (and probably for gamboost in the mboost package as well).

According to the mboost paper,
"For reasons of computational efficiency, the binary response y ∈ {0, 1} is converted into ˜y = 2y − 1 where ˜y ∈ {−1, 1}. ... Note that the transformation ˜y = 2y − 1 changes the interpretation of the effect estimates because now we get the half of the log-odds ratio. One implication is that the coef() output is half the estimations that result from glm(). This means that the user has to double the coefficients manually in order to get the final standard estimates of a logistic regression model. ... However, mboost automatically doubles the logits prior to the reverse probability transformation. This means that calling predict(fit, newdata, type ="response") produces the final probability estimates."

Let X be the predictor matrix, b the coefficients extracted by 'coef()'. Unlike predict(fit, newdata, type ="response"), predict(fit, newdata, type ="link") would only give Xb while the correct value is 2Xb. It seems that caret does something as predict(fit, newdata, type ="link"). i.e., without doubling Xb, and then takes the inverse logit transformation to calculate the probability.

Below is an example of the relationship among
predict(train_object$finalModel, newdata, type ="link")
predict(train_object$finalModel, newdata, type ="response")
predict(train_object, newdata, type='prob')
image

@topepo
Copy link
Owner

@topepo topepo commented Dec 27, 2016

Weirdly (to me at least), you get different response from gamboost depending on whether you use the link or the response:

> library(caret)
> library(mboost)
> 
> set.seed(1)
> tr <- twoClassSim(500)
> te <- twoClassSim(500)
> 
> set.seed(2)
> gamb_mod <- gamboost(Class ~ ., data = tr, family = Binomial())
> predict(gamb_mod, newdata = head(te))
            [,1]
[1,] -0.80740925
[2,]  0.09624685
[3,] -0.83826408
[4,]  0.61709862
[5,]  0.01748418
[6,] -0.73133217
> predict(gamb_mod, newdata = head(te), type = "response")
          [,1]
[1,] 0.1659207
[2,] 0.5479754
[3,] 0.1575557
[4,] 0.7745523
[5,] 0.5087412
[6,] 0.1880602
> binomial()$linkinv(predict(gamb_mod, newdata = head(te)))
          [,1]
[1,] 0.3084428
[2,] 0.5240432
[3,] 0.3019005
[4,] 0.6495584
[5,] 0.5043709
[6,] 0.3249025

I'll make the change that we use type = "response" instead of doing the conversion.

topepo pushed a commit that referenced this issue Dec 27, 2016
Max Kuhn Max Kuhn
@randel
Copy link
Author

@randel randel commented Dec 27, 2016

Right. They automatically doubled Xb for type='response' but not for type='link'. The same thing for glmboost, gamboost, and blackboost all in the mboost package.

> binomial()$linkinv(2*predict(gamb_mod, newdata = head(te)))
          [,1]
[1,] 0.1659206
[2,] 0.5479753
[3,] 0.1575558
[4,] 0.7745522
[5,] 0.5087412
[6,] 0.1880602
> 
> set.seed(2)
> blackb_mod <- blackboost(Class ~ ., data = tr, family = Binomial())
> predict(blackb_mod, newdata = head(te))
           [,1]
[1,] -1.5346256
[2,]  0.9221135
[3,] -1.3958245
[4,]  0.9780227
[5,]  0.3766340
[6,] -1.3929598
> predict(blackb_mod, newdata = head(te), type = "response")
           [,1]
[1,] 0.04439359
[2,] 0.86344786
[3,] 0.05777712
[4,] 0.87610434
[5,] 0.67989034
[6,] 0.05808981
> binomial()$linkinv(2*predict(blackb_mod, newdata = head(te)))
           [,1]
[1,] 0.04439359
[2,] 0.86344786
[3,] 0.05777712
[4,] 0.87610434
[5,] 0.67989034
[6,] 0.05808981
topepo pushed a commit that referenced this issue Dec 27, 2016
Max Kuhn Max Kuhn
@topepo
Copy link
Owner

@topepo topepo commented Dec 28, 2016

These changes should work but please verify with your data and let me know

Thanks

@topepo topepo closed this Jan 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.