Estimate monetary cost of executing plan #334

TomNicholas · 2023-12-15T16:15:53Z

Cubed arguably has enough information to give a rough estimate of the monetary cost of executing the plan before starting execution.

I'm imagining a new method .estimate_cost(executor) that is similar to .compute(executor). Calling this we would know

how many arrays are to be processed, how big they all are, and what numpy functions are to be used to process them via the Plan object,
which serverless executor the functions are to be run on via the Executor passed,
the the temporary intermediate bucket information via the Spec object,

It would just print an estimation of the cost back to the user without running anything, and maybe raise warnings if they are planning to do something that seems obviously expensive (e.g. like having their temporary bucket for intermediate data be in AWS but their executor be GCF).

This means if we had a little table somewhere of e.g. AWS lambda and S3 prices, Cubed could consult those numbers and sum them. It would require an idea of e.g. how long it takes to run np.mean() on a chunk of a certain size on a certain container, but this seems like something that can be discovered fairly straightforwardly.

Obviously there are a long tail of cases where this wouldn't work, but often you might still be able to provide a lower bound cost estimate. For example if your plan had a step that applied some arbitrary function with apply_gufunc, cubed would not know if that was some super expensive function that would run for ever, but it would still be possible to estimate the minimum cost assuming that that function was very light.

The text was updated successfully, but these errors were encountered:

tomwhite · 2023-12-17T11:54:47Z

Duplicate of #219?

TomNicholas · 2023-12-18T18:18:13Z

I had forgotten about #219, but actually I don't think this is a duplicate - I'm suggesting warning users of estimated costs before execution, whereas #219 seems to be about actual cost after execution. Though I imagine you could re-use much code when calculating both numbers.

Basically I think it would be useful for users to be able to see "hang on, this isn't supposed to cost that much, maybe I've not expressed the analysis I meant to..." before they actually waste that money.

tomwhite · 2023-12-19T09:52:44Z

Sounds good - let's keep both open.

TomNicholas added the enhancement New feature or request label Dec 15, 2023

TomNicholas changed the title ~~Estimate monetary cost~~ Estimate monetary cost of executing plan Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate monetary cost of executing plan #334

Estimate monetary cost of executing plan #334

TomNicholas commented Dec 15, 2023 •

edited

Loading

tomwhite commented Dec 17, 2023

TomNicholas commented Dec 18, 2023

tomwhite commented Dec 19, 2023

Estimate monetary cost of executing plan #334

Estimate monetary cost of executing plan #334

Comments

TomNicholas commented Dec 15, 2023 • edited Loading

tomwhite commented Dec 17, 2023

TomNicholas commented Dec 18, 2023

tomwhite commented Dec 19, 2023

TomNicholas commented Dec 15, 2023 •

edited

Loading