Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate monetary cost of executing plan #334

Open
TomNicholas opened this issue Dec 15, 2023 · 3 comments
Open

Estimate monetary cost of executing plan #334

TomNicholas opened this issue Dec 15, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Dec 15, 2023

Cubed arguably has enough information to give a rough estimate of the monetary cost of executing the plan before starting execution.

I'm imagining a new method .estimate_cost(executor) that is similar to .compute(executor). Calling this we would know

  • how many arrays are to be processed, how big they all are, and what numpy functions are to be used to process them via the Plan object,
  • which serverless executor the functions are to be run on via the Executor passed,
  • the the temporary intermediate bucket information via the Spec object,

It would just print an estimation of the cost back to the user without running anything, and maybe raise warnings if they are planning to do something that seems obviously expensive (e.g. like having their temporary bucket for intermediate data be in AWS but their executor be GCF).

This means if we had a little table somewhere of e.g. AWS lambda and S3 prices, Cubed could consult those numbers and sum them. It would require an idea of e.g. how long it takes to run np.mean() on a chunk of a certain size on a certain container, but this seems like something that can be discovered fairly straightforwardly.

Obviously there are a long tail of cases where this wouldn't work, but often you might still be able to provide a lower bound cost estimate. For example if your plan had a step that applied some arbitrary function with apply_gufunc, cubed would not know if that was some super expensive function that would run for ever, but it would still be possible to estimate the minimum cost assuming that that function was very light.

@TomNicholas TomNicholas added the enhancement New feature or request label Dec 15, 2023
@TomNicholas TomNicholas changed the title Estimate monetary cost Estimate monetary cost of executing plan Dec 15, 2023
@tomwhite
Copy link
Member

Duplicate of #219?

@TomNicholas
Copy link
Collaborator Author

I had forgotten about #219, but actually I don't think this is a duplicate - I'm suggesting warning users of estimated costs before execution, whereas #219 seems to be about actual cost after execution. Though I imagine you could re-use much code when calculating both numbers.

Basically I think it would be useful for users to be able to see "hang on, this isn't supposed to cost that much, maybe I've not expressed the analysis I meant to..." before they actually waste that money.

@tomwhite
Copy link
Member

Sounds good - let's keep both open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants