Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow resumable calls to recover after running out of fuel #696

Open
tomaka opened this issue Feb 27, 2023 · 10 comments
Open

Allow resumable calls to recover after running out of fuel #696

tomaka opened this issue Feb 27, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@tomaka
Copy link
Contributor

tomaka commented Feb 27, 2023

As we all know, because of the halting problem, it isn't possible to know whether the execution of a Wasm function will finish in a finite time.
While a Wasm function is being executed, maybe everything is okay and it is simply taking a long time, or maybe the Wasm execution is stuck in an infinite loop. It isn't possible to know in which of the two scenario we currently are.

Because the program that is interpreting the Wasm doesn't want, itself, to be stuck infinitely, it needs some sort of protection against the Wasm being stuck in an infinite loop.
While the program interpreting the Wasm can't know whether the Wasm is stuck in an infinite loop, it can use heuristics.
For example, if the Wasm has been executing for 10 seconds while normally it should take around 2ms, then it is likely to be stuck.

Wasmi provides a fuel system, making it possible to interrupt the execution of the Wasm after a certain number of instructions have been executed.
This system is however not very flexible. It's all-or-nothing: when all the fuel has been consumed, the execution completely stops and can never be resumed.

Furthermore, it is difficult to map "fuel quantities" with real world time/CPU usage. A precise fuel number means that whether the execution will run out of fuel or not doesn't depend on the speed of the machine, which has the advantage to be deterministic. However, this is also a drawback: it is not possible to stop the execution after an approximate number of seconds. If you pick a certain fuel quantity, on some platforms this fuel will expire quickly and on some others it will expire very slowly. Fuel quantities are very arbitrary anyway, and I think that it is legitimate to want to stop after a certain number of real life seconds.

In order to solve that problem, I would suggest to make it possible to combine the call_resumable feature with the fuel system.
When the fuel expires, call_resumable should return a Ok(ResumableCall::Resumable) (or something similar) instead of an Err.
This would make it possible for the API user to choose what to do: either stop the execution, or add more fuel and resume the execution. This provides maximum flexibility to this feature.
The API user could do things such as print a warning after a certain time, ask the human whether to continue or not, de-prioritize the thread executing the Wasm after a certain time, etc.

@Lucky4Luuk
Copy link

Lucky4Luuk commented Feb 27, 2023

Being able to use the fuel system together with call_resumable would be perfect. Would this be a difficult feature to implement? Perhaps I can find some time tonight to fork wasmi and implement this.

Also, if this feature is not possible for technical reasons, would it still be possible to let wasmi insert a function call to an imported function every n instructions? This way, we could at least have a system to yield from execution. This could help with using wasmi in an async context.

@Robbepop
Copy link
Member

Robbepop commented Feb 27, 2023

@tomaka Thanks for writing the issue!

I have a few questions ...

  1. In order to implement this feature we need to adjust the current API fro resumable function calls as it won't make sense to query the host function that caused the call to be interrupted in the first place. Also for efficiency reasons we might choose to diverge. What is the information that users of resumable function calls actually require? Is function type of the failed function enough? (Minimal information.)

  2. I suppose resumable calls should still return error for normal Wasm traps and this issue is basically about implementing a special rule for OutOfFuel traps, right?

To answer some questions:

At the moment fuel is somewhat arbitrary. However, there are plans to adjust our fuel costs to a similar policy as to what Wasmtime does in where basically one instruction roughly costs 1 fuel to execute. Currently we are trying to do a more precise job but since this can only be a heuristic at best it is better to make this even simpler just like Wasmtime did.

@Lucky4Luuk I think it should be technically feasible to implement this feature. The only actual question is if it is possible to implement this feature without massive performance regressions. We have been dealing with lots of those performance regressions for various reasons lately. Still trying to figure out why. At the moment most of the currently open PRs cannot be merged due to weird performance regressions.

For the same reason we probably won't implement a feature to call a host function every X instructions. First this is heavily underspecified and secondly host function calls are among the most expensive things you can do in wasmi. So we usually try to minimize those if possible.

@tomaka
Copy link
Contributor Author

tomaka commented Feb 28, 2023

In order to implement this feature we need to adjust the current API fro resumable function calls as it won't make sense to query the host function that caused the call to be interrupted in the first place. Also for efficiency reasons we might choose to diverge. What is the information that users of resumable function calls actually require? Is function type of the failed function enough? (Minimal information.)

Well, in smoldot I also need the parameters that were passed to the host function. Again, the idea is simply to implement the body of the host function "externally" (which makes it possible for example to perform asynchronous operations) and not within a callback.

I suppose resumable calls should still return error for normal Wasm traps and this issue is basically about implementing a special rule for OutOfFuel traps, right?

That's your choice. I suppose that every error that can be resumed from could lead to Ok(ResumableCall::Resumable) instead of Err. Of course, it would then need to be explained which ones are and which ones aren't, or you could change Trap to be enum Trap { Resumable(ResumableTrap), NonResumable(NonResumableTrap) }.

At the moment fuel is somewhat arbitrary. However, there are plans to adjust our fuel costs to a similar policy as to what Wasmtime does in where basically one instruction roughly costs 1 fuel to execute. Currently we are trying to do a more precise job but since this can only be a heuristic at best it is better to make this even simpler just like Wasmtime did.

Note that this is not really related to my explanation. "One instruction == one fuel" is just as arbitrary, because I assume that nobody is going to calculate the amount of fuel to pass to wasmi by precisely counting how many instructions are necessary (it would be a relatively bad idea since that number depends on the goodwill of the compiler). Instead, you just pass an arbitrary fuel value and increase it if it's not enough.

@Robbepop
Copy link
Member

Robbepop commented Mar 1, 2023

Well, in smoldot I also need the parameters that were passed to the host function.

Technically you could embed those parameters into the host error that caused the interruption of the resumable call, right?
The current API does not offer this functionality (yet) and I am not sure it will because it would introduce serious overhead for resumable call handles. It might still be a good idea to implement this if the majority of use cases require this information.

or you could change Trap to be enum Trap { Resumable(ResumableTrap), NonResumable(NonResumableTrap) }.

That's partially what already happens in the Engine not visible to users. However, to me it is important that the resumable call API does not crawl on top of every other abstraction in wasmi but is kept isolated.

I will think about a possible API to extend resumable calls with OutOfFuel traps and post it here once I am done. However, there is other work to be done beforehand so it will probably take a little while. Until then you are free to be posting your idea of what you'd expect of a resumable call API that would satisfy all your needs as of now. :)

@Robbepop Robbepop added the enhancement New feature or request label Mar 1, 2023
@Robbepop
Copy link
Member

Robbepop commented Mar 11, 2023

@tomaka How pressing is fixing this issue for smoldot?

@tomaka
Copy link
Contributor Author

tomaka commented Mar 11, 2023

It's not pressing.

The objective is simply for smoldot to be able to resist to malicious chain. At the moment, if you connect smoldot to a malicious chain, this chain can freeze smoldot.
However, I don't see a legitimate use case where you want to connect to a chain without knowing whether it's malicious. Typically, you connect a light client to a chain that you know, not an unknown chain.
Even if it exists, all the malicious chain can do is a be a small nuisance by freezing smoldot, and it's doubtful why someone would bother with that kind of attack in the first place.

@Robbepop
Copy link
Member

Robbepop commented Mar 11, 2023

It's not pressing.

The objective is simply for smoldot to be able to resist to malicious chain. At the moment, if you connect smoldot to a malicious chain, this chain can freeze smoldot. However, I don't see a legitimate use case where you want to connect to a chain without knowing whether it's malicious. Typically, you connect a light client to a chain that you know, not an unknown chain. Even if it exists, all the malicious chain can do is a be a small nuisance by freezing smoldot, and it's doubtful why someone would bother with that kind of attack in the first place.

Okay thanks a lot for this info!
I am going to fix this eventually and properly but it is good to know that you are not blocked by this issue for now.

@Robbepop Robbepop changed the title Make it possible to recover after running out of fuel Allow resumable calls to recover after running out of fuel Mar 13, 2023
@AbleTheAbove
Copy link

Hello. Not to necro this thread but I have a viable use case for this in my operating system AbleOS. I'd like to have this functionality.

@Robbepop
Copy link
Member

Robbepop commented Apr 5, 2023

This is currently blocked mostly because of two things:

  1. I am not yet super confident that I have a decent enough understanding of the technical requirements of users (you). What I'd need here is the exact semantics and what behaviors you'd expect for what you need and in the best case even an example code for your personal use case.
  2. I don't yet know how this can be implemented as zero-cost abstraction or with minimal overhead in general. I am sure it can be done though.

@AbleTheAbove
Copy link

The goal: I'd like for the wasm engine to be pauseable so I may do task switching.

I have two main ways that I think my goal could be accomplished.

I think there are two ways this can be done:

  • Fuel metering with resuming
  • A tick based system

And if both are implemented I think there is enough variation that all use cases can be covered.

My usecase would require a pause button that halts the engine at the next instruction, stores state and can later be unpaused.

Thank you for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants