How does bibat perform with large datasets? #82

teddygroves · 2024-04-05T06:10:17Z

See reviewer comment here.

I can think of a few relevant topics:

some arviz storage formats are faster than others. Currently bibat allows either zarr or json: these should be compared with each other and the other options.
there is no provision for running inferences in parallel. This could potentially save a lot of time for complex projects with large datasets and wouldn't be too hard to do.
there is no option to not load all data in memory at once. Implementing a 'big data' Bayesian workflow where chunks of data are processed separately would be pretty tricky, but it would be interesting to see at exactly what scale memory problems start and then tell users about this.

Provide feedback