Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111

mfateev · 2019-12-27T23:24:11Z

Description

Currently the state object is passed as part of Ctx.data field. This approach is limiting as each integration uses that field exclusively. For example if my own library needs to pass state in data field it cannot use WASI integration which requires WasiState object assigned to Ctx.data.

This PR converts Ctx.data to a HashMap allowing multiple state objects sharing it.

Review

Ctx.data converted to the HashMap to support attaching multiple independent State objects.

YaronWittenstein · 2020-01-01T20:07:33Z

@mfateev

I didn't benchmark, but it seems to me preferable to stay with simple *mut c_void data
instead of HashMap.

Maybe having 2 data fields, one for WASI/else and one for the end-user for managing the stuff he needes will be a good compromise.

mfateev · 2020-01-02T18:08:28Z

Two fields are not enough as every library might need its own context. I don't see why WASI is special to have its own field.

MarkMcCaskey

Looks like a great start -- thanks for working on this!

I'd love to see something like this land in Wasmer! But I have a few concerns about some of the implementation details in addition to the inline comments:

The position/existence of the lock. Assuming sound APIs (which due to some issues I'm aware of is not a strictly safe assumption in our codebase right now, but is something we're working toward), there's no need to have a RwLock on the top of the map. If we have &mut Ctx, we should be the only &mut Ctx and can safely mutate its members. Given that, we can by extension, safely mutate items in the map as well (if the function signature takes a &mut Ctx). If it takes a &Ctx, then it may make sense to have a lock around each item in the map and have the methods return a type which uses RAII to ensure that it stays locked (we may even want to offer both).
The use of String as the key. 2 points here, does it need to be dynamic at all? All of the code here is such that we could use a &'static str (there may be uses for dynamic values but they're not immediately obvious to me). If it does need to be dynamic, we could also have a registration system with handles rather than using strings and use either a Vec or a generational arena. The reason I suggest Vec is that the ability to remove items from the context seems strange to me -- I also can't think of any use cases where that's a major concern. It's a small point and Rust's hashmaps are fast, but it's important to keep things core to the system as efficient as we reasonably can.
Can we do anything else? We definitely should make a change to support more data here because composition of ABIs is a major feature. But while we're changing it, it's a good time to consider our options more widely. I'm a little sleep-deprived now so not at my most creative, but something that just came to mind is using higher order functions or closures here to reduce complexity. This may not make any sense actually, I'll revisit this after I get some good sleep. (I just remembered that we can use closures as host functions now, so it's actually possible to have arbitrary data associated with things in a composable way already (as I understand it), the big difference is that in the closure model the data is private to the creator, whereas in the shared hashmap model it's open to anyone that knows the key)

This is a change that will need some design work as its impacting a core part of the API, feel free to push back on any of the points I've raised -- I'm interested in having some discussions to find the best solution we can here.

MarkMcCaskey · 2020-01-09T21:06:00Z

lib/runtime-core/Cargo.toml

@@ -20,6 +20,7 @@ libc = "0.2.60"
 hex = "0.3"
 smallvec = "0.6"
 bincode = "1.1"
+maplit = "1.0.2"


We should just depend on version "1", because this is used as a library, we want to use whatever version is in the application's lockfile unless we need specific bug fixes to work correctly!

The justification is that:

duplicated dependencies can cause issues in some cases, for example if semantic versioning is violated and a bug is introduced

duplicated dependencies can cause increased binary size and slower compile times

MarkMcCaskey · 2020-01-09T21:13:01Z

lib/runtime-core/src/vm.rs

+    ///
+    /// This function must be called with the same type, `T`, that the `data`
+    /// was initialized with.
+    pub fn data_mut<T>(&self, data_key: &str) -> &mut T {


This function should be marked unsafe if it can lead to memory issues through safe Rust, otherwise the API is unsound!

Additionally, it seems like there's another implicit guarantee here that needs to be documented (or the API changed), the returned value does not include the lock that's preventing the hash map from being accessed. This means that because we're passing in a &self, we can, in safe Rust, get 2 &mut to the same entry in the hashmap (this is also a problem from multiple threads)

MarkMcCaskey · 2020-01-09T21:13:27Z

lib/runtime-core/src/vm.rs

+    ) -> (&Memory, &mut T) {
+        (
+            self.memory(mem_index),
+            &mut *(self.data.read().unwrap().get(data_key).unwrap().0 as *mut T),


Same issue with not keeping the map locked here!

mfateev · 2020-01-10T01:22:12Z

Thanks for considering. This is my first attempt ever writing Rust code so any feedback is really welcome.

The position/existence of the lock. Assuming sound APIs (which due to some issues I'm aware of is not a strictly safe assumption in our codebase right now, but is something we're working toward), there's no need to have a RwLock on the top of the map. If we have &mut Ctx, we should be the only &mut Ctx and can safely mutate its members. Given that, we can by extension, safely mutate items in the map as well (if the function signature takes a &mut Ctx). If it takes a &Ctx, then it may make sense to have a lock around each item in the map and have the methods return a type which uses RAII to ensure that it stays locked (we may even want to offer both).

I see. It makes a lot of sense once I realize the guarantees the Rust provides. I'm going to fix this.

The use of String as the key. 2 points here, does it need to be dynamic at all? All of the code here is such that we could use a &'static str (there may be uses for dynamic values but they're not immediately obvious to me). If it does need to be dynamic, we could also have a registration system with handles rather than using strings and use either a Vec or a generational arena. The reason I suggest Vec is that the ability to remove items from the context seems strange to me -- I also can't think of any use cases where that's a major concern. It's a small point and Rust's hashmaps are fast, but it's important to keep things core to the system as efficient as we reasonably can.

Pretty reasonable, let me think how the registration system would look like.

Can we do anything else? We definitely should make a change to support more data here because composition of ABIs is a major feature. But while we're changing it, it's a good time to consider our options more widely. I'm a little sleep-deprived now so not at my most creative, but something that just came to mind is using higher order functions or closures here to reduce complexity. This may not make any sense actually, I'll revisit this after I get some good sleep. (I just remembered that we can use closures as host functions now, so it's actually possible to have arbitrary data associated with things in a composable way already (as I understand it), the big difference is that in the closure model the data is private to the creator, whereas in the shared hashmap model it's open to anyone that knows the key)

Would you explain why imports are configured per instance, not per module? My mental model is that a module is loaded once, configured with all the imports and then Instances that reuse it are created unlimited number of times. If imports belong to a module, not an instance then the closures are not going to work.

Looking a the code I see that an ImportObject instance can be reused by multiple instances. But when reused closures are not going to work as well. And it cannot really be reused if different context data is needed per instance.

But I might be missing something that requires imports to be instance specific.

This is a change that will need some design work as its impacting a core part of the API, feel free to push back on any of the points I've raised -- I'm interested in having some discussions to find the best solution we can here.

I really appreciate this.

Hywan · 2020-01-10T10:15:38Z

Thanks for the proposal! Ctx.data is also used by the C API, and I'm afraid it'll break existing code. I would highly recommend to keep Ctx.data's type as *mut c_void.

MarkMcCaskey · 2020-01-10T15:44:32Z

Looking a the code I see that an ImportObject instance can be reused by multiple instances. But when reused closures are not going to work as well. And it cannot really be reused if different context data is needed per instance.

That's a good point 🤔 ; I'll talk to @Hywan about this

Thanks for the proposal! Ctx.data is also used by the C API, and I'm afraid it'll break existing code. I would highly recommend to keep Ctx.data's type as *mut c_void.

Yeah, good point, we need to make sure we make changes that are going to work with the C-API, though we may need a breaking change of some kind. That said, we can probably do this in a way that doesn't break existing code, too.

MarkMcCaskey · 2020-01-10T16:08:11Z

Would you explain why imports are configured per instance, not per module? My mental model is that a module is loaded once, configured with all the imports and then Instances that reuse it are created unlimited number of times. If imports belong to a module, not an instance then the closures are not going to work.

Hmm, I don't know exactly off the top of my head, but I can guess that an Instance is a runnable thing and combining imports with a module makes it runnable. See the section on instantiation in the Wasm spec.

Yeah, so right now the data flow looks like &Module + &ImportObject -> Instance. ImportObject has a data creator on it which is used when making the Instance (I don't know if we expose this through the C API yet, but I don't think we do). So every Instance will have its own copy of user-defined data.

You're correct that closures in the ImportObject don't share that property, though assuming we're not breaking Rust ownership rules it's purely an issue of identity, not of shared mutability.

There's also a question of how we'll eventually handle Wasm threads where we'll definitely want some kind of shared data between the Ctx(s?).

mfateev · 2020-01-10T16:57:17Z

Thanks for the proposal! Ctx.data is also used by the C API, and I'm afraid it'll break existing code. I would highly recommend to keep Ctx.data's type as *mut c_void.

I'm not insisting on any specific solution. But the current state of the world when it is not possible to have more than one ABI per instance is broken. IMHO at this point wasm and wasmer are moving so fast and are essentially prototypes that braking compatibility is still acceptable.

MarkMcCaskey · 2020-01-18T00:36:35Z

So the more I think about it, the more I like the idea of having a Vec<void*>. @Hywan what do you think about that? If reuse is an issue we can use a slab or a generational arena, but I can't think of a use case where reuse is very important right now.

So the API would look like:

a way to get a TypedIndex of some kind which would use or mutex to get the next index
a way to get data with this index

We can keep the old field and methods around, too, if we want to avoid breaking things but ideally we could deprecate the other functions and remove it sometime in the fairly far future

if we want to be super sensitive about cache-locality, we can use a SmallVec here, too.

syrusakbary · 2020-07-08T06:26:27Z

After some iteration we got into a very nice API for resolving this, which allows any function to have custom data attached.
This removes the need of the instance data hashmap, making the implementation much simpler.

We will create examples once the refactor lands in master.

syrusakbary · 2020-08-24T16:11:44Z

Thanks for the work on the PR @mfateev.

We have been working on a refactor of Wasmer in the last months that allows having a different state for each function. So you can mix WASI functions with custom ones, for example.

We will add examples for this soon. (adding @MarkMcCaskey to the thread).

Closing the PR for now!

mfateev added 6 commits December 19, 2019 16:59

Added *.iml to .gitignore

045c6de

Merge branch 'master' of github.com:wasmerio/wasmer

151f858

Changed Ctx.data to HashMap

7a98037

emscripten changes related to Ctx.data refactoring

68c3072

Fixed broken comment code

52a95ac

Formatting and comments

17b0e70

mfateev requested review from bjfish, Hywan, MarkMcCaskey and syrusakbary as code owners December 27, 2019 23:24

cargo fmt

513aab9

mfateev added 2 commits January 3, 2020 16:58

Merge branch 'master' of github.com:wasmerio/wasmer into ctx_data

92bbb99

Merge branch 'master' of github.com:wasmerio/wasmer into ctx_data

3e283b0

MarkMcCaskey suggested changes Jan 9, 2020

View reviewed changes

mfateev added 2 commits January 9, 2020 16:43

Merge branch 'master' of github.com:wasmerio/wasmer into ctx_data

a635284

merge from masster

7496cb8

syrusakbary added the resolved in wasmer refactor label Jul 8, 2020

syrusakbary closed this Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111

Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111

mfateev commented Dec 27, 2019

YaronWittenstein commented Jan 1, 2020

mfateev commented Jan 2, 2020

MarkMcCaskey left a comment

MarkMcCaskey Jan 9, 2020

MarkMcCaskey Jan 9, 2020

MarkMcCaskey Jan 9, 2020

MarkMcCaskey Jan 9, 2020

mfateev commented Jan 10, 2020

Hywan commented Jan 10, 2020

MarkMcCaskey commented Jan 10, 2020 •

edited

MarkMcCaskey commented Jan 10, 2020

mfateev commented Jan 10, 2020

MarkMcCaskey commented Jan 18, 2020 •

edited

syrusakbary commented Jul 8, 2020

syrusakbary commented Aug 24, 2020

Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111

Converted Ctx.data to a HashMap to support multiple stateful import function libraries. #1111

Conversation

mfateev commented Dec 27, 2019

Description

Review

YaronWittenstein commented Jan 1, 2020

mfateev commented Jan 2, 2020

MarkMcCaskey left a comment

Choose a reason for hiding this comment

MarkMcCaskey Jan 9, 2020

Choose a reason for hiding this comment

MarkMcCaskey Jan 9, 2020

Choose a reason for hiding this comment

MarkMcCaskey Jan 9, 2020

Choose a reason for hiding this comment

MarkMcCaskey Jan 9, 2020

Choose a reason for hiding this comment

mfateev commented Jan 10, 2020

Hywan commented Jan 10, 2020

MarkMcCaskey commented Jan 10, 2020 • edited

MarkMcCaskey commented Jan 10, 2020

mfateev commented Jan 10, 2020

MarkMcCaskey commented Jan 18, 2020 • edited

syrusakbary commented Jul 8, 2020

syrusakbary commented Aug 24, 2020

MarkMcCaskey commented Jan 10, 2020 •

edited

MarkMcCaskey commented Jan 18, 2020 •

edited