Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111

Closed
wants to merge 11 commits into from

Conversation

mfateev
Copy link

@mfateev mfateev commented Dec 27, 2019

Description

Currently the state object is passed as part of Ctx.data field. This approach is limiting as each integration uses that field exclusively. For example if my own library needs to pass state in data field it cannot use WASI integration which requires WasiState object assigned to Ctx.data.

This PR converts Ctx.data to a HashMap allowing multiple state objects sharing it.

Review

  • Ctx.data converted to the HashMap to support attaching multiple independent State objects.

@YaronWittenstein
Copy link
Contributor

@mfateev

I didn't benchmark, but it seems to me preferable to stay with simple *mut c_void data
instead of HashMap.

Maybe having 2 data fields, one for WASI/else and one for the end-user for managing the stuff he needes will be a good compromise.

@mfateev
Copy link
Author

mfateev commented Jan 2, 2020

Two fields are not enough as every library might need its own context. I don't see why WASI is special to have its own field.

Copy link
Contributor

@MarkMcCaskey MarkMcCaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start -- thanks for working on this!

I'd love to see something like this land in Wasmer! But I have a few concerns about some of the implementation details in addition to the inline comments:

  • The position/existence of the lock. Assuming sound APIs (which due to some issues I'm aware of is not a strictly safe assumption in our codebase right now, but is something we're working toward), there's no need to have a RwLock on the top of the map. If we have &mut Ctx, we should be the only &mut Ctx and can safely mutate its members. Given that, we can by extension, safely mutate items in the map as well (if the function signature takes a &mut Ctx). If it takes a &Ctx, then it may make sense to have a lock around each item in the map and have the methods return a type which uses RAII to ensure that it stays locked (we may even want to offer both).

  • The use of String as the key. 2 points here, does it need to be dynamic at all? All of the code here is such that we could use a &'static str (there may be uses for dynamic values but they're not immediately obvious to me). If it does need to be dynamic, we could also have a registration system with handles rather than using strings and use either a Vec or a generational arena. The reason I suggest Vec is that the ability to remove items from the context seems strange to me -- I also can't think of any use cases where that's a major concern. It's a small point and Rust's hashmaps are fast, but it's important to keep things core to the system as efficient as we reasonably can.

  • Can we do anything else? We definitely should make a change to support more data here because composition of ABIs is a major feature. But while we're changing it, it's a good time to consider our options more widely. I'm a little sleep-deprived now so not at my most creative, but something that just came to mind is using higher order functions or closures here to reduce complexity. This may not make any sense actually, I'll revisit this after I get some good sleep. (I just remembered that we can use closures as host functions now, so it's actually possible to have arbitrary data associated with things in a composable way already (as I understand it), the big difference is that in the closure model the data is private to the creator, whereas in the shared hashmap model it's open to anyone that knows the key)

This is a change that will need some design work as its impacting a core part of the API, feel free to push back on any of the points I've raised -- I'm interested in having some discussions to find the best solution we can here.

@@ -20,6 +20,7 @@ libc = "0.2.60"
hex = "0.3"
smallvec = "0.6"
bincode = "1.1"
maplit = "1.0.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just depend on version "1", because this is used as a library, we want to use whatever version is in the application's lockfile unless we need specific bug fixes to work correctly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The justification is that:

  1. duplicated dependencies can cause issues in some cases, for example if semantic versioning is violated and a bug is introduced
  2. duplicated dependencies can cause increased binary size and slower compile times

///
/// This function must be called with the same type, `T`, that the `data`
/// was initialized with.
pub fn data_mut<T>(&self, data_key: &str) -> &mut T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be marked unsafe if it can lead to memory issues through safe Rust, otherwise the API is unsound!

Additionally, it seems like there's another implicit guarantee here that needs to be documented (or the API changed), the returned value does not include the lock that's preventing the hash map from being accessed. This means that because we're passing in a &self, we can, in safe Rust, get 2 &mut to the same entry in the hashmap (this is also a problem from multiple threads)

) -> (&Memory, &mut T) {
(
self.memory(mem_index),
&mut *(self.data.read().unwrap().get(data_key).unwrap().0 as *mut T),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue with not keeping the map locked here!

@mfateev
Copy link
Author

mfateev commented Jan 10, 2020

Thanks for considering. This is my first attempt ever writing Rust code so any feedback is really welcome.

  • The position/existence of the lock. Assuming sound APIs (which due to some issues I'm aware of is not a strictly safe assumption in our codebase right now, but is something we're working toward), there's no need to have a RwLock on the top of the map. If we have &mut Ctx, we should be the only &mut Ctx and can safely mutate its members. Given that, we can by extension, safely mutate items in the map as well (if the function signature takes a &mut Ctx). If it takes a &Ctx, then it may make sense to have a lock around each item in the map and have the methods return a type which uses RAII to ensure that it stays locked (we may even want to offer both).

I see. It makes a lot of sense once I realize the guarantees the Rust provides. I'm going to fix this.

  • The use of String as the key. 2 points here, does it need to be dynamic at all? All of the code here is such that we could use a &'static str (there may be uses for dynamic values but they're not immediately obvious to me). If it does need to be dynamic, we could also have a registration system with handles rather than using strings and use either a Vec or a generational arena. The reason I suggest Vec is that the ability to remove items from the context seems strange to me -- I also can't think of any use cases where that's a major concern. It's a small point and Rust's hashmaps are fast, but it's important to keep things core to the system as efficient as we reasonably can.

Pretty reasonable, let me think how the registration system would look like.

  • Can we do anything else? We definitely should make a change to support more data here because composition of ABIs is a major feature. But while we're changing it, it's a good time to consider our options more widely. I'm a little sleep-deprived now so not at my most creative, but something that just came to mind is using higher order functions or closures here to reduce complexity. This may not make any sense actually, I'll revisit this after I get some good sleep. (I just remembered that we can use closures as host functions now, so it's actually possible to have arbitrary data associated with things in a composable way already (as I understand it), the big difference is that in the closure model the data is private to the creator, whereas in the shared hashmap model it's open to anyone that knows the key)

Would you explain why imports are configured per instance, not per module? My mental model is that a module is loaded once, configured with all the imports and then Instances that reuse it are created unlimited number of times. If imports belong to a module, not an instance then the closures are not going to work.

Looking a the code I see that an ImportObject instance can be reused by multiple instances. But when reused closures are not going to work as well. And it cannot really be reused if different context data is needed per instance.

But I might be missing something that requires imports to be instance specific.

This is a change that will need some design work as its impacting a core part of the API, feel free to push back on any of the points I've raised -- I'm interested in having some discussions to find the best solution we can here.

I really appreciate this.

@Hywan
Copy link
Contributor

Hywan commented Jan 10, 2020

Thanks for the proposal! Ctx.data is also used by the C API, and I'm afraid it'll break existing code. I would highly recommend to keep Ctx.data's type as *mut c_void.

@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Jan 10, 2020

Looking a the code I see that an ImportObject instance can be reused by multiple instances. But when reused closures are not going to work as well. And it cannot really be reused if different context data is needed per instance.

That's a good point 🤔 ; I'll talk to @Hywan about this


Thanks for the proposal! Ctx.data is also used by the C API, and I'm afraid it'll break existing code. I would highly recommend to keep Ctx.data's type as *mut c_void.

Yeah, good point, we need to make sure we make changes that are going to work with the C-API, though we may need a breaking change of some kind. That said, we can probably do this in a way that doesn't break existing code, too.

@MarkMcCaskey
Copy link
Contributor

Would you explain why imports are configured per instance, not per module? My mental model is that a module is loaded once, configured with all the imports and then Instances that reuse it are created unlimited number of times. If imports belong to a module, not an instance then the closures are not going to work.

Hmm, I don't know exactly off the top of my head, but I can guess that an Instance is a runnable thing and combining imports with a module makes it runnable. See the section on instantiation in the Wasm spec.

Yeah, so right now the data flow looks like &Module + &ImportObject -> Instance. ImportObject has a data creator on it which is used when making the Instance (I don't know if we expose this through the C API yet, but I don't think we do). So every Instance will have its own copy of user-defined data.

You're correct that closures in the ImportObject don't share that property, though assuming we're not breaking Rust ownership rules it's purely an issue of identity, not of shared mutability.

There's also a question of how we'll eventually handle Wasm threads where we'll definitely want some kind of shared data between the Ctx(s?).

@mfateev
Copy link
Author

mfateev commented Jan 10, 2020

Thanks for the proposal! Ctx.data is also used by the C API, and I'm afraid it'll break existing code. I would highly recommend to keep Ctx.data's type as *mut c_void.

I'm not insisting on any specific solution. But the current state of the world when it is not possible to have more than one ABI per instance is broken. IMHO at this point wasm and wasmer are moving so fast and are essentially prototypes that braking compatibility is still acceptable.

@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Jan 18, 2020

So the more I think about it, the more I like the idea of having a Vec<void*>. @Hywan what do you think about that? If reuse is an issue we can use a slab or a generational arena, but I can't think of a use case where reuse is very important right now.

So the API would look like:

  • a way to get a TypedIndex of some kind which would use or mutex to get the next index
  • a way to get data with this index

We can keep the old field and methods around, too, if we want to avoid breaking things but ideally we could deprecate the other functions and remove it sometime in the fairly far future

if we want to be super sensitive about cache-locality, we can use a SmallVec here, too.

@syrusakbary
Copy link
Member

After some iteration we got into a very nice API for resolving this, which allows any function to have custom data attached.
This removes the need of the instance data hashmap, making the implementation much simpler.

We will create examples once the refactor lands in master.

@syrusakbary
Copy link
Member

Thanks for the work on the PR @mfateev.

We have been working on a refactor of Wasmer in the last months that allows having a different state for each function. So you can mix WASI functions with custom ones, for example.

We will add examples for this soon. (adding @MarkMcCaskey to the thread).

Closing the PR for now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants