Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign uppredict does not produce the probability for glmboost #560
Comments
|
Weirdly (to me at least), you get different response from > library(caret)
> library(mboost)
>
> set.seed(1)
> tr <- twoClassSim(500)
> te <- twoClassSim(500)
>
> set.seed(2)
> gamb_mod <- gamboost(Class ~ ., data = tr, family = Binomial())
> predict(gamb_mod, newdata = head(te))
[,1]
[1,] -0.80740925
[2,] 0.09624685
[3,] -0.83826408
[4,] 0.61709862
[5,] 0.01748418
[6,] -0.73133217
> predict(gamb_mod, newdata = head(te), type = "response")
[,1]
[1,] 0.1659207
[2,] 0.5479754
[3,] 0.1575557
[4,] 0.7745523
[5,] 0.5087412
[6,] 0.1880602
> binomial()$linkinv(predict(gamb_mod, newdata = head(te)))
[,1]
[1,] 0.3084428
[2,] 0.5240432
[3,] 0.3019005
[4,] 0.6495584
[5,] 0.5043709
[6,] 0.3249025I'll make the change that we use |
|
Right. They automatically doubled Xb for
|
|
These changes should work but please verify with your data and let me know Thanks |
This may not be an issue of caret, but I just want to bring it to your attention. In short,
predict(train_object, type='prob')actually does not produce the probability forglmboost(and probably forgamboostin themboostpackage as well).According to the
mboostpaper,"For reasons of computational efficiency, the binary response y ∈ {0, 1} is converted into ˜y = 2y − 1 where ˜y ∈ {−1, 1}. ... Note that the transformation ˜y = 2y − 1 changes the interpretation of the effect estimates because now we get the half of the log-odds ratio. One implication is that the coef() output is half the estimations that result from glm(). This means that the user has to double the coefficients manually in order to get the final standard estimates of a logistic regression model. ... However, mboost automatically doubles the logits prior to the reverse probability transformation. This means that calling
predict(fit, newdata, type ="response")produces the final probability estimates."Let X be the predictor matrix, b the coefficients extracted by 'coef()'. Unlike
predict(fit, newdata, type ="response"),predict(fit, newdata, type ="link")would only giveXbwhile the correct value is2Xb. It seems thatcaretdoes something aspredict(fit, newdata, type ="link"). i.e., without doublingXb, and then takes the inverse logit transformation to calculate the probability.Below is an example of the relationship among

predict(train_object$finalModel, newdata, type ="link")predict(train_object$finalModel, newdata, type ="response")predict(train_object, newdata, type='prob')