You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not 100% sure. But I think in pred <- predict(fit, newdata = df, outcome = "test")$predicted.oob
the outcome argument should be 'train'. The part in the documentation is a bit cryptic:
"If outcome="test", the predictor is calculated by using y-outcomes from the test data (outcome information must be present). In this case, the terminal nodes from the grow-forest are recalculated using the y-outcomes from the test set. This yields a modified predictor in which the topology of the forest is based solely on the training data, but where the predicted value is based on the test data. Error rates and VIMP are calculated by bootstrapping the test data and using out-of-bagging to ensure unbiased estimates. See the examples for illustration"
But I don't think we want to recalculate the terminal nodes from the original forest, but just drop down the test data and get a prediction. I came to it bc i got different and counterintuitive results from it with the outcome='test' but matching and intuitive results with outcome='train'.
I changed it in the package code to do some tests.
The text was updated successfully, but these errors were encountered:
yea. i have thought about it a bit now and i think you are right in that we should not have outcome = "test. however, it is unclear to me from the docs whether predict(fit, newdata = df, outcome = "train") is the same as predict(fit, newdata = df). for one, predicted.oob is not available in the latter option. this suggests to me that they are dropping the unmodified training data down the tree. does the pd actually work in this case? should be easy to detect since you wouldn't be setting the dataframe to its unique values.
I'm not 100% sure. But I think in
pred <- predict(fit, newdata = df, outcome = "test")$predicted.oob
the outcome argument should be 'train'. The part in the documentation is a bit cryptic:
"If outcome="test", the predictor is calculated by using y-outcomes from the test data (outcome information must be present). In this case, the terminal nodes from the grow-forest are recalculated using the y-outcomes from the test set. This yields a modified predictor in which the topology of the forest is based solely on the training data, but where the predicted value is based on the test data. Error rates and VIMP are calculated by bootstrapping the test data and using out-of-bagging to ensure unbiased estimates. See the examples for illustration"
But I don't think we want to recalculate the terminal nodes from the original forest, but just drop down the test data and get a prediction. I came to it bc i got different and counterintuitive results from it with the outcome='test' but matching and intuitive results with outcome='train'.
I changed it in the package code to do some tests.
The text was updated successfully, but these errors were encountered: