Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting new groups in dbart_vi? #17

Open
timdisher opened this issue Nov 19, 2019 · 6 comments
Open

Predicting new groups in dbart_vi? #17

timdisher opened this issue Nov 19, 2019 · 6 comments

Comments

@timdisher
Copy link

I am using dbart_vi to fit a random intercept model to allow for estimation of correlated effects across multiple outcomes. The ultimate goal of this analysis is to inform a microsimulation and thus it is important that I can predict new observations. Currently it looks as if predict requires the same group levels to be present, it it possible to allow prediction into new groups (so that for example, I can predict into a simulated population that is twice as large as the training set).

@vdorie
Copy link
Owner

vdorie commented Nov 19, 2019

I'm rewriting that right now - hope to have something by the end of the week.

@timdisher
Copy link
Author

Great! Will test it in my application as soon as you do

@vdorie
Copy link
Owner

vdorie commented Dec 3, 2019

I checked in something (9fa610e) a few days ago that adds predict for out-of-sample groups. Let me know if you encounter any issues.

@bachlaw
Copy link

bachlaw commented Jan 14, 2021

Vince, how would this work if, instead of using the predict function on a BART model with stored trees, we instead just added the new level to all levels of the random effects group in the test data (essentially, to marginalize out the random effect in predictions) and wanted to directly access posteriors for both to see the effect this has?

Right now, after doing what I described above, I can see within the BART object that the new category was added to the ranef slot and its value is essentially 0. So far, so good. But if I do a ColMeans on the draws in yhat.test they are the same as the values I get from in yhat.train, even though I would think they should be different, as all rows in yhat.test had the new group level.

Should I instead be looking to yhat.train.mean for the marginal mean without random effects and just not bother adding the testing data with the new level? Hope this question makes sense.

Thanks for this great feature.

@vdorie
Copy link
Owner

vdorie commented Jan 21, 2021

I'm sorry, but I don't directly follow. If you don't use predict and saved trees but have new levels in the test data, it should create random effects for them. I think the confusion is maybe that yhat.test is just the BART component, whereas if you want the full predictions for the test data you need to add in the random effects part too. The easiest thing to do would be to call fitted(fit, type = 'ev', sample = 'test').

I know all of this is confusing, which is why I'm working on a more general framework here. At the moment it can only do continuous outcomes, but it should have a cleaner interface.

@jlevy44
Copy link

jlevy44 commented Jul 5, 2021

This is very helpful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants