New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111
Conversation
I didn't benchmark, but it seems to me preferable to stay with simple Maybe having 2 data fields, one for WASI/else and one for the end-user for managing the stuff he needes will be a good compromise. |
Two fields are not enough as every library might need its own context. I don't see why WASI is special to have its own field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great start -- thanks for working on this!
I'd love to see something like this land in Wasmer! But I have a few concerns about some of the implementation details in addition to the inline comments:
-
The position/existence of the lock. Assuming sound APIs (which due to some issues I'm aware of is not a strictly safe assumption in our codebase right now, but is something we're working toward), there's no need to have a
RwLock
on the top of the map. If we have&mut Ctx
, we should be the only&mut Ctx
and can safely mutate its members. Given that, we can by extension, safely mutate items in the map as well (if the function signature takes a&mut Ctx
). If it takes a&Ctx
, then it may make sense to have a lock around each item in the map and have the methods return a type which uses RAII to ensure that it stays locked (we may even want to offer both). -
The use of
String
as the key. 2 points here, does it need to be dynamic at all? All of the code here is such that we could use a&'static str
(there may be uses for dynamic values but they're not immediately obvious to me). If it does need to be dynamic, we could also have a registration system with handles rather than using strings and use either a Vec or a generational arena. The reason I suggest Vec is that the ability to remove items from the context seems strange to me -- I also can't think of any use cases where that's a major concern. It's a small point and Rust's hashmaps are fast, but it's important to keep things core to the system as efficient as we reasonably can. -
Can we do anything else? We definitely should make a change to support more data here because composition of ABIs is a major feature. But while we're changing it, it's a good time to consider our options more widely. I'm a little sleep-deprived now so not at my most creative, but something that just came to mind is using higher order functions or closures here to reduce complexity. This may not make any sense actually, I'll revisit this after I get some good sleep. (I just remembered that we can use closures as host functions now, so it's actually possible to have arbitrary data associated with things in a composable way already (as I understand it), the big difference is that in the closure model the data is private to the creator, whereas in the shared hashmap model it's open to anyone that knows the key)
This is a change that will need some design work as its impacting a core part of the API, feel free to push back on any of the points I've raised -- I'm interested in having some discussions to find the best solution we can here.
@@ -20,6 +20,7 @@ libc = "0.2.60" | |||
hex = "0.3" | |||
smallvec = "0.6" | |||
bincode = "1.1" | |||
maplit = "1.0.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just depend on version "1", because this is used as a library, we want to use whatever version is in the application's lockfile unless we need specific bug fixes to work correctly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The justification is that:
- duplicated dependencies can cause issues in some cases, for example if semantic versioning is violated and a bug is introduced
- duplicated dependencies can cause increased binary size and slower compile times
/// | ||
/// This function must be called with the same type, `T`, that the `data` | ||
/// was initialized with. | ||
pub fn data_mut<T>(&self, data_key: &str) -> &mut T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be marked unsafe
if it can lead to memory issues through safe Rust, otherwise the API is unsound!
Additionally, it seems like there's another implicit guarantee here that needs to be documented (or the API changed), the returned value does not include the lock that's preventing the hash map from being accessed. This means that because we're passing in a &self
, we can, in safe Rust, get 2 &mut
to the same entry in the hashmap (this is also a problem from multiple threads)
) -> (&Memory, &mut T) { | ||
( | ||
self.memory(mem_index), | ||
&mut *(self.data.read().unwrap().get(data_key).unwrap().0 as *mut T), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue with not keeping the map locked here!
Thanks for considering. This is my first attempt ever writing Rust code so any feedback is really welcome.
I see. It makes a lot of sense once I realize the guarantees the Rust provides. I'm going to fix this.
Pretty reasonable, let me think how the registration system would look like.
Would you explain why imports are configured per instance, not per module? My mental model is that a module is loaded once, configured with all the imports and then Instances that reuse it are created unlimited number of times. If imports belong to a module, not an instance then the closures are not going to work. Looking a the code I see that an But I might be missing something that requires imports to be instance specific.
I really appreciate this. |
Thanks for the proposal! |
That's a good point 🤔 ; I'll talk to @Hywan about this
Yeah, good point, we need to make sure we make changes that are going to work with the C-API, though we may need a breaking change of some kind. That said, we can probably do this in a way that doesn't break existing code, too. |
Hmm, I don't know exactly off the top of my head, but I can guess that an Yeah, so right now the data flow looks like You're correct that closures in the There's also a question of how we'll eventually handle Wasm threads where we'll definitely want some kind of shared data between the |
I'm not insisting on any specific solution. But the current state of the world when it is not possible to have more than one ABI per instance is broken. IMHO at this point wasm and wasmer are moving so fast and are essentially prototypes that braking compatibility is still acceptable. |
So the more I think about it, the more I like the idea of having a So the API would look like:
We can keep the old field and methods around, too, if we want to avoid breaking things but ideally we could deprecate the other functions and remove it sometime in the fairly far future if we want to be super sensitive about cache-locality, we can use a |
After some iteration we got into a very nice API for resolving this, which allows any function to have custom data attached. We will create examples once the refactor lands in master. |
Thanks for the work on the PR @mfateev. We have been working on a refactor of Wasmer in the last months that allows having a different state for each function. So you can mix WASI functions with custom ones, for example. We will add examples for this soon. (adding @MarkMcCaskey to the thread). Closing the PR for now! |
Description
Currently the state object is passed as part of
Ctx.data
field. This approach is limiting as each integration uses that field exclusively. For example if my own library needs to pass state in data field it cannot use WASI integration which requiresWasiState
object assigned toCtx.data
.This PR converts Ctx.data to a HashMap allowing multiple state objects sharing it.
Review
Ctx.data
converted to the HashMap to support attaching multiple independent State objects.