Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdstan format #211

Closed
4 tasks done
wlandau opened this issue Nov 8, 2020 · 13 comments
Closed
4 tasks done

cmdstan format #211

wlandau opened this issue Nov 8, 2020 · 13 comments
Assignees

Comments

@wlandau
Copy link
Member

wlandau commented Nov 8, 2020

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
  • Format your code according to the tidyverse style guide.

Proposal

We could consider accommodating CmdStanFit objects with tar_target(format = "cmdstanr"). Related: stan-dev/cmdstanr#340.

@wlandau wlandau self-assigned this Nov 8, 2020
@wlandau
Copy link
Member Author

wlandau commented Nov 9, 2020

On reflection, I do not think this is worth a special feature. Users can load the relevant output into memory and specify the existing rds or qs format.

self$draws() 
try(self$sampler_diagnostics(), silent = TRUE) 
try(self$init(), silent = TRUE) 

@wlandau wlandau closed this as completed Nov 9, 2020
@wlandau
Copy link
Member Author

wlandau commented Nov 9, 2020

Related: @mike-lawrence and others, do you think there would be value in a tarchetypes-like package for cmdstan models? Some possible target archetypes:

  • tar_cmdstan_fit("model.stan"): Fit a Stan model, call the methods in cmdstan format #211 (comment), and save the object in qs format.
  • tar_cmdstan_fit_summary("model.stan"): Same, except the return value is a lightweight data frame of posterior summaries and diagnostics.
  • tar_cmdstan_validate("model.stan", sim_function, batches = 20, sims_per_batch = 40): define several targets to repeatedly simulate from a model and determine how often the true parameters in the data are recaptured in credible intervals.

@mike-lawrence
Copy link

That sounds for sure useful!

@wlandau
Copy link
Member Author

wlandau commented Nov 23, 2020

This weekend, I implemented something like #211 (comment) in a new package called stantargets. It's not public yet, but it is fully fleshed out and working. I will post a link as soon as I get permission to share.

@wlandau
Copy link
Member Author

wlandau commented Nov 30, 2020

stantargets is now open source! It is an extension to targets for Stan-powered Bayesian data analysis. stantargets makes it super easy to set up useful pipelines without having to write many functions or think about branching. It supports all the features of cmdstanr (MCMC, variational Bayes, optimization), and it supports both single-fit workflows and multi-rep simulation studies (vignettes here).

@wlandau
Copy link
Member Author

wlandau commented Dec 18, 2020

Update: I am trying to continue what I started with stantargets and cultivate a whole ecosystem of these packages: https://wlandau.github.io/targetopia.html. My background is in Bayesian statistics, so that's where I am starting. But I also want to branch out to machine learning. @mattwarkentin, do you think the stuff people do with torch has enough workflow patterns to materialize in a stantargets-like package?

@mattwarkentin
Copy link
Contributor

Hey @wlandau. Upon seeing targetopia, I immediately thought about whether I have anything worthwhile to contribute. My mind went to torch, but my hesitation was based on the fact that I really have yet to use or inspect stantargets et al. so I don't really know well enough what those packages do to reduce the friction of using targets. I'll try to dig into stantargets this weekend and see if it inspires any thoughts about how something similar could be warranted for torch.

@wlandau
Copy link
Member Author

wlandau commented Dec 19, 2020

I immediately thought about whether I have anything worthwhile to contribute.

I think you do. You know targets super well, and you are experienced with torch, which I have not used in a real-world project.

My mind went to torch, but my hesitation was based on the fact that I really have yet to use or inspect stantargets et al. so I don't really know well enough what those packages do to reduce the friction of using targets.

Targetopia packages aim to automatically build in options and techniques that would either be annoying to implement manually or too advanced for the majority of users. Examples:

  • Create several targets in one go. For example, stantargets::tar_stan_mcmc() creates a target for the output object and targets for each of the friendly summaries the user might want.
  • Batched replication with dynamic branching. tarchetypes::tar_rep() is the simplest example of this. Dynamic branching is too complicated and intimidating for most users, and even for me it can be a headache to manually set up a batching scheme. stantargets functions like tar_stan_mcmc_rep_summary() use this same idea.
  • File tracking. Most users aren't going to bother learning enough about targets to know they need format = "file" in tar_target() to track files.
  • Memory management: If you know a target is going to take up a lot of memory, we can automatically set memory = "transient" and/or garbage_collection = TRUE for that target. tar_stan_mcmc_rep_draws() does this.
  • Same goes for HPC options like deployment (and maybe storage and memory for that matter). stantargets needs to be able to run on remote workers without access to the Stan model file. So stantargets works around this with a local target that reads the lines of the file and uses deployment = "main" (see the compile argument of various functions). Then, subsequent targets automatically accept those file lines in memory, write them to a temporary file, and compile the model locally (one compilation per batch).

At some point I plan to write general guidance for targetopia developers.

@mattwarkentin
Copy link
Contributor

Thanks for the detailed response, Will. This is very helpful for understanding how these add-on packages contribute to the targetopia. I will mull it over a bit more and try to sketch out some torch workflow patterns that I think might lend themselves well to this ecosystem.

@wlandau
Copy link
Member Author

wlandau commented Dec 21, 2020

I am glad you are interested. With some good torch workflow patterns, I think a new torch-powered targetopia package is possible. cc @skeydan and @dfalbel.

@dfalbel
Copy link

dfalbel commented Dec 23, 2020

That looks really nice! I's be happy to help! Let me know if you need any development in the torch side too.

@wlandau
Copy link
Member Author

wlandau commented Dec 23, 2020

Thanks, glad to hear it! The torch vignettes are super nice, and they are helping me begin to wrap my head around usage patterns. If anything else strikes you as a particularly ubiquitous archetype, please let us know.

@wlandau
Copy link
Member Author

wlandau commented Jan 4, 2021

@mattwarkentin, FYI: the R Targetopia now has its own website with a more detailed contributing guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants