Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasi: Overhaul the WasiFs API #1219

Closed
brain0 opened this issue Feb 14, 2020 · 16 comments
Closed

wasi: Overhaul the WasiFs API #1219

brain0 opened this issue Feb 14, 2020 · 16 comments
Labels
🎉 enhancement New feature! 📦 lib-wasi About wasmer-wasi priority-medium Medium priority issue 🏚 stale Inactive issues or PR

Comments

@brain0
Copy link

brain0 commented Feb 14, 2020

Motivation

The WasiFs API is ... chaotic. It does not have a stable and well-documented interface and does not fit all use cases. I'd like to start a discussion of what a good API could look like.

The current API

The current API exposes all the internal fields of WasiFs, but warns not to use them. It contains functions to access the stdin, stdout and stderr and to swap them their implementation (as demonstrated in one example). It also allows mapping directories, although there is no documentation of what exactly this means. Then there are some functions whose purpose is unclear to me:

open_dir_all

This functions seems to open a directory, but it is not clear what to do with it. It is also marked unsafe with no explanation of the contract one must fulfill to call it safely.

open_file_at

This seems to be the only way to create a file descriptor to some kind of "virtual" file that the host controls. However, it seems that to do so, a directory entry is created. Also it is undocumented how the rights and flags arguments should be filled.

create_fd

This function seems to create a new WASI file descriptor, but one has to pass an Inode, but there is no public API to create one.

Proposed solution

Separation of responsibilities

I have to admit that I have not studied the WASI specification in detail yet. However, looking at https://github.com/WebAssembly/WASI/blob/master/phases/snapshot/docs.md, most of the file related APIs seem very Unix-like (and in fact, the documentation refers to POSIX functions a few times). In particular, file descriptors and the filesystem are two separate concepts. The current WasiFs API mixes those concepts, which leads to all the confusion I described above. The same lack of separation is also visible in the implementation.

There are two kinds of file related APIs in WASI:

path_*

These APIs manipulate the file system, or return open file descriptors from files in the file system.

fd_

These APIs manipulate file descriptors.

I propose to refactor the wasmer-wasi WasiFs implementation and API with this distinction in mind.

Use cases

We want to control the file descriptors available to the WASI module, and control what the file system looks like from inside the module.

File descriptors

I can think of two types of file descriptors:

  • File descriptors that map to a real file descriptor on the host system, be it an open file, a directory or a special descriptor (like a socket, pipe or the host's stdout).
  • File descriptors whose operations are controlled by the host program.

The host application must be able to create such file descriptors and pass them to the WASI module. In particular, as in POSIX, a file descriptor need not relate to any actual file in the directory structure.

Filesystem

The filesystem is similar, we need two kinds of directory entries:

  • Directory entries that map to paths on the host.
  • Directory entries that map to virtual directories or files that are controlled by the host application.

Opening a directory entry of the first type results in a file descriptor of the first type, the same goes for the second type.

Next steps

I am not ready to propose any API yet, I'll need to do some experimentation and prototyping. The reason for opening this issue is to start a discussion:

  • Is the project open to such a comprehensive overhaul of the WasiFs API?
  • Do my thoughts make sense so far, is this the right direction?

Alternatives

We could conclude that the current approach is sufficient to satisfy all use cases and try to extend and improve the current API.

@brain0 brain0 added the 🎉 enhancement New feature! label Feb 14, 2020
@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Feb 15, 2020

Thanks for filing the issue and being comprehensive in covering the different pieces! Good feedback is very valuable!

I agree very strongly with the premise that we should improve the WASI FS API; much of the existing API is incidental (public fields, methods that are public that probably should not be, and unsafe hacky stop-gap functions). This is also the reason we haven't exposed this API through our C-API or any other language integrations -- it's definitely an experimental API that needs to be shaped based on how people want to use it.


I'll respond to pieces of the issue now so we can go deeper on the design constraints of certain pieces.

it also allows mapping directories, although there is no documentation of what exactly this means

We tried to clarify this a bit with some of the newer APIs, for example WasiStateBuilder::map_dir. I agree that there's a lot of room to improve the documentation and presenting some of the core concepts up-front.

It is also marked unsafe with no explanation of the contract one must fulfill to call it safely.

Unfortunately this is a wider issue in our codebase at the moment. There's a clippy lint for this and we've discussed that we should turn it on. The action item here would be to roll out the lint on wasi as it has very few unsafe functions and handle the rest later.

This function seems to create a new WASI file descriptor, but one has to pass an Inode, but there is no public API to create one.

Yeah, this relies on using the public fields and also generally requires knowledge of irrelevant implementation details. open_file_at and the public fields are sufficient for this but it's not obvious how to use it.


In response to the proposed solution:

In particular, file descriptors and the filesystem are two separate concepts. The current WasiFs API mixes those concepts, which leads to all the confusion I described above

Much of that was intentional with the goal being that guest programs can be controlled by the host without any knowledge of the host. WASI doesn't provide APIs (other than preopening directories) for passing file descriptors to a WASI guest, doing this would require extensions to WASI and guest knowledge of the host and you said it better than I can, "We want to control the file descriptors available to the WASI module, and control what the file system looks like from inside the module."

Though I agree that we should separate them more!

So it turns out emulating a filesystem portably is really hard (something I get into more of with the next point). For example, Wasm threads are something that we'll eventually have to deal with so the distinction between a shared virtual filesystem with synchronization and Fds will become increasingly important.

Filesystem

The filesystem is similar, we need two kinds of directory entries:

Directory entries that map to paths on the host.
Directory entries that map to virtual directories or files that are controlled by the host application.

Opening a directory entry of the first type results in a file descriptor of the first type, the same goes for the second type.

I agree with the distinction but there's some complexities here in regards to files on the host filesystem vs special files. For example, there's complexities around opening the same file with read and write from different file descriptors and what the behavior there should be. There are complexities with programs outside of the host's control modifying the host file system during operation. We also support pause and resume, so the WasiFS must be able to be fully serialized and deserialized; we don't store the contents of files in memory explicitly (the OS may) currently, thus programs may not be able to be resumed due to some of the previous complexities (though network sockets will also complicate this). A decent amount of this is something that needs to be discussed at the WASI meetings to figure out what the correct behavior is, especially given that the Windows filesystem can behave drastically different from typical Unix-like filesystems and that we want everything to work the same on all systems.

I think it's probably best that we keep the distinction as opaque as possible to most code so that things work the same regardless of implementation details. There's been discussion about removing socket-specific WASI syscalls in favor of using fd_read and fd_write and I think that's the way the design will generally be going.


Is the project open to such a comprehensive overhaul of the WasiFs API?

Absolutely! A big reason the development has stalled a bit is due to lack of recent discussion, it's still experimental and we'd love to improve our APIs. We will have to be careful about breaking existing users though, but what that effectively means is marking existing functions as deprecated and have them print warnings pointing to the new API for some number of releases and maintaining both the old and new APIs for a period of time.

Do my thoughts make sense so far, is this the right direction?

I think so! It seems like you have a pretty good grasp of the big picture. There are a couple of hard problems but we can do our best to solve them and iterate from there, I think there's a very small chance that we'll get it right on the first try in any case.

TODO:

  • Add improved introduction to WASI core concepts in doc comments in wasmer_wasi::state or wasmer_wasi.
    • Explain preopened directories (and how they can be aliased with map_dir)
    • Explain rights (TODO: figure out a decent rights API)
    • Explain the idea behind the virtual filesystem
  • Update the plugin example to use WasiStateBuilder
  • Enable clippy lint to deny unsafe functions without a # Safety section on wasmer_wasi and file issue to enable it for the rest of Wasmer
  • Revisit public fields and public functions
  • Fully define deprecation/upgrade path as it makes sense to do so

@brain0
Copy link
Author

brain0 commented Feb 15, 2020

Thanks for your response. I may reply in more detail later, these are just some additional thoughts after finally understanding how the pre-opened directories work.

WASI doesn't provide APIs (other than preopening directories) for passing file descriptors to a WASI guest [...]

Personally, I would always pass a pre-opened virtual root directory (and no other pre-opened directories) to the guest and handle everything else on the host.

Much of that was intentional with the goal being that guest programs can be controlled by the host without any knowledge of the host.

We're thinking about different use cases here. I'd like to use WASM/WASI for loading "plugins" from a host application. In that case, I can provide specific APIs that pass arbitrary file descriptors to the guest.

I know that currently the pre-opened directories won't be initialized by libc in that case, but we should be able to work around that by calling __wasilibc_register_preopened_fd manually after loading the module.

bors bot added a commit that referenced this issue Feb 18, 2020
1229: Add clippy::missing_safety_doc lint to wasi, misc clean up r=MarkMcCaskey a=MarkMcCaskey

Part of #1219 

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Mark McCaskey <mark@wasmer.io>
@brain0
Copy link
Author

brain0 commented Mar 2, 2020

I've been looking into this by starting to prototype a bit. I first created a proc macro that parses the witx and creates "native" representations of the WASI types. It then creates more "rusty" types and (fallible) conversions to/from those. For example, in the native types, the enum is simply represented as an integer, the rusty types use a proper Rust enum.

After that, it represents the WASI interface as a Rust trait using the native types. On top of that, I am manually building a trait to implement the WASI interface in a way that feels more natural for Rust. This is about building proper abstractions for then implementing the actual WASI functionality in the host.

Anyway, I am running into a difficult question:

How can I safely access the WASM memory? The API of MemoryView allows me to get a slice of cells of bytes. Using that to access the memory is probably incredibly inefficient, since I have to read everything byte by byte. However, handing (mutable) slices of the WASM memory to the implementor of methods like fd_read and fd_write seems incredibly unsafe. So, how do you reason about data races here?

@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Mar 3, 2020

I've been looking into this by starting to prototype a bit. I first created a proc macro that parses the witx and creates "native" representations of the WASI types. It then creates more "rusty" types and (fallible) conversions to/from those. For example, in the native types, the enum is simply represented as an integer, the rusty types use a proper Rust enum

That sounds great! The one issue with this is that we need types that are always valid when they come from Wasm. So all enums coming from the Wasm must be integers and not Rust enums. Having a Rust enum out of bounds is undefined behavior so in general at the FFI boundary we have to deal with low level types.

There are cases where it'd be nice to return WASI errors to callers in Rust though and that's currently something that's not really handled in the current API, we either work around it in most cases, have a wrapper type that captures some important cases in other cases, or return the number directly in other cases.

After that, it represents the WASI interface as a Rust trait using the native types. On top of that, I am manually building a trait to implement the WASI interface in a way that feels more natural for Rust. This is about building proper abstractions for then implementing the actual WASI functionality in the host.

Thanks I was just getting frustrated with our WASI FS code this morning and decided to take a break and come back to it.

I think we want to be quite generic with what we support which means that there are some potentially difficult constraints in the FS design.

By the way if you want to invite me to a repo to get specific about ideas or talk in a faster medium than Github issues let me know! We just created a Slack for the community at https://slack.wasmer.io/ . Feel free to join and ping me there or send me an email with another preferred method of communication at mark@wasmer.io if that's something you're interested in. I'm excited about this proposal and I'd like to make sure I can unblock you as quickly as possible if you run into issues!

How can I safely access the WASM memory? The API of MemoryView allows me to get a slice of cells of bytes. Using that to access the memory is probably incredibly inefficient, since I have to read everything byte by byte. However, handing (mutable) slices of the WASM memory to the implementor of methods like fd_read and fd_write seems incredibly unsafe. So, how do you reason about data races here?

The answer is not simple. We discussed this a bit in #1249.

The answer is: in general, Wasm linear memory is super unsafe and there's nothing safe you can do with it. Luckily, we don't have to solve the general case right now and we can work with a more tractable subset of the problem.

My preferred method of using Wasm memory in Wasmer is with the WasmPtr abstraction (note to self, improve documentation here).

The way WasmPtr works is that anything that implements ValueType can be accessed directly. WasmPtr is a zero-cost wrapper around a u32 with safe methods for accessing it. So you probably see that we take WasmPtr<T> as arguments in our system calls, this is because WasmPtr is transparent over u32 and behaves as a u32 at FFI boundaries.

The usage is roughly:

let w: WasmPtr<MyType> = WasmPtr::new(my_u32);
let item = wasi_try!(w.deref(memory));
item.get().field

Hmm actually looking at the API again, deref_mut definitely isn't safe! It still allows &mut T to the underlying data which is not guaranteed to be safe. edit: nevermind, deref_mut is marked as unsafe already! I'm glad I didn't miss something so obvious

A good Rust API for Wasm linear memory is something I've been thinking about and having discussions about for quite a while now. The constraints are that we want to preserve a simple API but also have guarantees at compile time about the type of memory (unshared static memory is the safest and also what essentially everything is right now). I haven't come up with anything I really like yet but my work on it has been mostly just thinking about it and not really focusing full time on it yet.

@brain0
Copy link
Author

brain0 commented Mar 4, 2020

That sounds great! The one issue with this is that we need types that are always valid when they come from Wasm. So all enums coming from the Wasm must be integers and not Rust enums. Having a Rust enum out of bounds is undefined behavior so in general at the FFI boundary we have to deal with low level types.

My code deals with that: The conversion from the native WASI type to the Rust type is fallible. If conversion fails, the (currently manually built) translation layer returns errno_inval.

The "upper" layer then only receives defined values, so the implementor of the high-level logic need not deal with invalid values.

Thanks I was just getting frustrated with our WASI FS code

It's reassuring that I am not the only one.

I think we want to be quite generic with what we support which means that there are some potentially difficult constraints in the FS design.

That would be the next step. I'd look into having WasiFd, WasiFile and WasiDirectory traits (probably) which encode all constraints in the type system:

  • Thread safety.
  • Ability to "dup" a file descriptor.
  • Ability to have several file descriptors to the same data (that cannot realy be encoded in the type system, but needs to be handled by each implementation).
  • Save/resume (which means serializing and deserializing state, although I assume that most of the time, this will lead to invalid file descriptors after resume - similar to suspending a laptop with active network connections).

Then on top of those traits, all different uses cases should be implemented. I'm not quite there yet, as I am still trying to wrap my head (and code) around the basics.

By the way if you want to invite me to a repo to get specific about ideas or talk in a faster medium than Github issues let me know! We just created a Slack for the community at https://slack.wasmer.io/ .

I have never used slack, so I'll need to look into that.

I have however decided to share what I have so far. There are some edges that I need to look into and the code needs serious cleanup in many places, but it already shows a rough idea of where I want to go. I guess I'll just put a big fat warning into the README. I'll post a link here later.

However, the code does not yet include any of the actual WasiFs APIs that I want to discuss, this is all just trying to create an abstraction of WASI in the Rust/wasmer host that I can actually reason about.

The answer is not simple. We discussed this a bit in #1249.

I have been asking the same question on the #wg-wasm Rust discord channel. It seems that this is a real problem that nobody solved yet, but this is what I gathered:

  • We can never pass the buffers from WASM memory directly to the implementation of fd_read, fd_write and friends as references. To have a chance of safety, we always need to perform the copy from/to WASM memory ourselves, then pass references to buffers that we own.

About actually performing those copies:

  • Only passing around the WASI memory as *mut u8 does not help. One could imagine that only using std::ptr::read and std::ptr::write and friends would make things safe, but it's not that simple. While it is okay to have multuple *mut u8 references to the same region, actually accessing this memory via std::ptr::read and std::ptr::write needs to follow the same aliasing rules as creating a reference in order to be defined. In a multi-threaded situation, we cannot know whether another WASM thread modifies regions of memory we are reading, so we can make no guaranteeshere.
  • Byte-by-byte (Cell<u8> or AtomicU8) accesses should be safe. This is what I currently implement. But when you think about reading several megabytes (or more) from a WASI file, that sounds horribly inefficient. (This is what I currently implement, but the way I currently do it also has tons of unnecessary bounds checks, because every byte read/written also does a separate slice access into the [Cell<u8>].)
  • One could ensure to "pause" all WASM threads while manipulating WASM memory. This would allow actually getting a reference to a byte slice inside and manipulate it. This would require support in the wasmer runtime.
  • Another idea would be the ability to "lock" regions of WASM memory. This way, the regions that contains the buffers passed to us could be locked so we could safely manipulate them, and all other WASM threads would block until those operations complete. This would require support in the wasmer runtime.

My preferred method of using Wasm memory in Wasmer is with the WasmPtr abstraction.

For my code, I actually included my own modified version of WasmPtr that has APIs for writing all WASI types directly (it uses generated code that knows the offsets inside the struct and such).

A good Rust API for Wasm linear memory is something I've been thinking about and having discussions about for quite a while now.

Again, it is very reassuring that I am not the only one who thinks this is a problem.

I'll get back to this later today and post the code and documentation online.

@brain0
Copy link
Author

brain0 commented Mar 4, 2020

@MarkMcCaskey
Copy link
Contributor

Sounds good!

That would be the next step. I'd look into having WasiFd, WasiFile and WasiDirectory traits (probably) which encode all constraints in the type system:

The original design of the Wasi filesystem had more traits but I removed them because, at least as the system was at the time, it didn't make sense for them to be separate. This will probably be easier to do properly with a strong distinction between Fds and the files themselves.

Save/resume (which means serializing and deserializing state, although I assume that most of the time, this will lead to invalid file descriptors after resume - similar to suspending a laptop with active network connections).

I had a discussion with @nlewycky about this and some other edge cases related to external programs mutating the files the Wasm program is accessing at the same time and Windows support (Windows locks files on read access, not just write access) a while ago and the solution we came up with was that we pretty much have to do what real filesystems do and store the file in memory. It seems that there is no way around having to mmap files to support save/resume and to portably support some of the more complex file access patterns. I'm actually not super familiar with how these things are implemented in typical filesystems, but we'll probably have to do whatever they're doing.

Regardless, it's easy to get overwhelmed with the number of different things to consider here and my strategy so far has been think ahead as far as I'm able to and go with what works now and iterate as needed.

Only passing around the WASI memory as *mut u8 does not help. One could imagine that only using std::ptr::read and std::ptr::write and friends would make things safe, but it's not that simple. While it is okay to have multuple *mut u8 references to the same region, actually accessing this memory via std::ptr::read and std::ptr::write needs to follow the same aliasing rules as creating a reference in order to be defined. In a multi-threaded situation, we cannot know whether another WASM thread modifies regions of memory we are reading, so we can make no guaranteeshere.

Hmm, I'm not sure, I learned in a discussion in #1249 that MIRI at least does not complain about mutable aliasing when dealing with pointers. See https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bdd96fbd2d4b21e9312cd1d4b62317a9 for example (you can run it with MIRI using the tools section in the top right). MIRI detects UB is my understanding. I believe that pointers are not borrow checked at all, even mutably accessing them. As long as the memory they point to was known to Rust as mutable, then it shouldn't be an issue. We may need to use volatile_reads when dealing with generated code mutating that memory (Rust may optimize away pointer reads if we don't), but I believe that should work!

Another idea would be the ability to "lock" regions of WASM memory. This way, the regions that contains the buffers passed to us could be locked so we could safely manipulate them, and all other WASM threads would block until those operations complete. This would require support in the wasmer runtime.

My main concern there is that it seems that memory region locking would impose overhead on all memory accesses, but I'm not super familiar with how things like this are typically done, sometimes there's hardware support for doing useful things like this.

I saw some of your posts on Discord, I believe it's always safe for the Host to non-atomically memcpy to and from guest memory. As long as the host is just getting a bunch of bytes out, then it's fine. If the guest is using atomic instructions, then we should probably ensure that the writes have actually settled before doing it, but we don't need to worry about mutation happening while we're memcopying. Worst case scenario the host gets some scrambled bytes and returns an error to the guest or kills the guest. If the host can ensure that the guest that called the host-function has had its writes reflected, then it's up to the guest to ensure it doesn't clobber its own memory.

The host should never write values to Wasm memory and then expect to be able to read them back. Additionally the host should never get anything other than always-valid bytes from Wasm memory.

Again, it is very reassuring that I am not the only one who thinks this is a problem.

Yeah! I'm happy to see interest here too!

Thanks for posting the links to code and docs!

@brain0
Copy link
Author

brain0 commented Mar 4, 2020

Hmm, I'm not sure, I learned in a discussion in #1249 that MIRI at least does not complain about mutable aliasing when dealing with pointers. See https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bdd96fbd2d4b21e9312cd1d4b62317a9 for example (you can run it with MIRI using the tools section in the top right). MIRI detects UB is my understanding. I believe that pointers are not borrow checked at all, even mutably accessing them. As long as the memory they point to was known to Rust as mutable, then it shouldn't be an issue.

MIRI is by no means complete, it is known that it does not detect all cases of UB. The documentation of std::ptr refers to the Behavior considered undefined chapter of the Nomicon which states that all data races are UB. The Data Races and Race Conditions chapter defines a data race as follows:

  • two or more threads concurrently accessing a location of memory
  • one of them is a write
  • one of them is unsynchronized

With std::ptr operations, you can produce data races that are impossible for MIRI to track.

I am worried about the guest invoking UB in the host somehow.

Thanks for posting the links to code and docs!

I am currently trying to run the simplest of programs (a Rust program with an empty main function) and I am running into some issues which I am looking into. I will update the code and docs once I fixed this.

@MarkMcCaskey
Copy link
Contributor

Hmm, so it seems that atomics and non-atomics interact poorly in that there are pretty much no guarantees at all. But we can use a relaxed atomic memcpy from the host side and that'll always be safe, we may need to figure out some kind of memory fence situation when calling host-functions.

The trouble here is that WebAssembly itself allows Wasm modules to access shared memory non-atomically. We can always just compile all non-atomic accesses of shared memory into atomic accesses and make that a non-issue though.

With std::ptr operations, you can produce data races that are impossible for MIRI to track.

That's true but ptrs can't be shared across threads in Rust so there's no safe way to get aliasing mutable pointers in different threads, you have to have used unsafe to get into that situation.

It seems to me that as long as the host treats Wasm memory as a blackbox through which we can atomically memcpy into and out of, then any undefined behavior that happens isn't really relevant. What I'm thinking is that the Guest is untrusted in general and is responsible for making sure it accesses memory properly, if it doesn't and host writes into Wasm memory get lost or host reads from Wasm memory are corrupt, then the host can just say, "I can't parse these bytes" and be in a perfectly safe and defined state. Perhaps I'm missing a way in which data races can cause undefined behavior that can't be validated this way.

Put more concisely, it seems that a proper API makes this entirely safe for the host and if the guest behaves incorrectly then the guest may have host calls fail or have its own data corrupted, but both of those situations seem fine to me (because it's the guest's fault).

@aidanhs
Copy link

aidanhs commented Jan 13, 2021

Was wondering if there had been any progress on this? I've been directed here while reading the documentation for wasmer-wasi (https://docs.rs/wasmer-wasi/1.0.0/wasmer_wasi/struct.WasiFs.html#safety)

(I'm looking to implement a purely-virtual FS and it looks like I'm going to have to fiddle with the internals of WasiFs in order to create new inodes - the helpful looking create_inode_with_stat is pub(crate))

@MarkMcCaskey
Copy link
Contributor

@aidanhs Thanks for the interest! Yeah, our system is set up to be pretty flexible unfortunately there's a lot of inherent complexity in the WasiFs so it might be a lot of effort to simplify what we have. Most of the pub/pub(crate)/private distinction isn't too intentional, it's probably reasonable to expose the function you mentioned, the one issue is just that the code around WasiFs likely isn't as modular as it should be in a few places because it was created iteratively to meet the needs of our WASI implementation and hasn't gone through much refactoring.

It probably makes sense to consider the current API as "deprecated" even though it's what we use internally and there's nothing to replace it yet. But I think coming to the problem of "extensible WASI system API" with fresh eyes would be really helpful in making a nice API here without the footguns which I'm sure exist there. That said, if you can get what you want done with these functions then I think that's a reasonable place to start.

Perhaps we can make a new, nice API iteratively as well: there are a few ways to create Inodes, as far as I can remember, a builder pattern would probably work well here. If you're interested in trying that out, I'd be happy to review and give guidance as necessary.

The WASI API fell through the cracks a bit over the past year as we reworked wasmer's internals but I'll bring it up again with the team!

@brain0
Copy link
Author

brain0 commented Jan 14, 2021

As you can see above, I prototyped a different approach for WASI last year. It included (incomplete) code generation from the witx file and a different approach to file system APIs.

Sadly, I didn't have the time to finish it up - and I guess the WASI spec evolved since then.

Here is what I remember:

  • The WASI filesystem specification is incredibly complex, in my opinion needlessly so. It is similar to POSIX, but then it isn't, since there is no „root“ directory. The guest systems emulate a file system through conventions between the hosts and wasilibc, which are not documented anywhere. In the end, I failed to structure my code properly to support all this complexity.
  • More generally, it is unclear to me how to access WASM memory from the host without unsoundness. The only way that was probably sound was through the &[Cell<u8>] API. Being unable to (mutably) borrow a slice of WASM memory is bad for performance. I don't know if wasmer 1.0 has improved the memory API, since I haven't looked at it for a while.
  • WASI is designed to work as an executable, not a library, at least when used with wasilibc.

Overall, these issues made me lose interest in WASI. If the goal is enable Rust's std::{fs,net} in a WASM plugin (which was my goal at the time), then a much simpler specification than WASI would work IMO.

@aidanhs
Copy link

aidanhs commented Jan 16, 2021

Thanks for the comment @brain0. Doing a bit more reading on WASI itself, I agree that the level it's exposing to users is a little odd - it specifies directory based capabilities, but does so in a way that is a kinda unhelpful to POSIX applications (related: WebAssembly/WASI#122) unless you augment it a lot. Because of this, wasi-libc then has to step up to the plate in an attempt to paper over some of the cracks (e.g. the 'mounting' of capabilities as root-level directories, also see the discussion and linked PR for getcwd in https://github.com/WebAssembly/WASI/issues/303).

It occurs to me that the files api exposed by wasi is most similar to the FUSE low level API (it's inodes-based https://libfuse.github.io/doxygen/structfuse__lowlevel__ops.html) but harder. Specifically, the kernel side of things when using FUSE does a bunch of work behind the scenes and provides a single authority for path resolution that deals with things like "how does traversing mountpoints work" (allowing mountpoints to be composable) and file descriptors. Unfortunately, the wasi spec prevents a wasi backend from implementing a model like a POSIX filesystem (as you pointed out, the 'root' directory doesn't exist) - so backend implementors need to come up with something new, which is going to cause pain in user code (via breakage) or in backend code (because the model presented to user code doesn't match the 'true' model). At the moment we have pain in both places 😞 - I'm not sure there's specifically anything wasmer can do for a better API.

(tangentially, a somewhat tangential link to an in-memory FS in WASM - https://github.com/binji/wasm-clang?files=1)

Edit:

@MarkMcCaskey
Copy link
Contributor

WASI is definitely its own thing and the libc does do an emulation of a subset of the types of things you might expect to work on a filesystem, but yeah, it's explicitly not POSIX. WASI is becoming modular though, so it's possible that more POSIX-y like interfaces could be optionally added... one issue with that though is that fds and file handling in general are kind of central to the design of WASI.

https://docs.rs/wasi-common/0.22.0/wasi_common/fs/index.html - "Since all functions which expose raw file descriptors are unsafe, I/O handles in this API are unforgeable (unsafe code notwithstanding)." - seems like not quite what unsafe is intended for?

I don't have full context on what exactly this means, but my take on what unsafe means is:

unsafe indicates that the code has invariants required to keep the program sound and not cause undefined behavior (or memory accesses which break Rust's model) which can not be guaranteed by the Rust compiler. unsafe functions are essentially functions with checklists that the caller must manually verify when using, to ensure the stated invariants are upheld.

unsafe is sometimes used in a "softer" sense where it won't cause soundness issues, but for when the API may just do the wrong thing. I have mixed feelings about this. I've seen some discussions in the broader Rust community about this as well, so I think it's an on-going point of contention.

unsafe definitely means nothing when dealing with Wasm or FFI though, unsafe is purely a compile-time property and is specific to the Rust compiler's ability to check invariants.


I don't see a reason why other things besides WASI couldn't also exist, to solve some of these problems. The limiting factor is just getting people to work on it, but if you wanted to take on a big project, you could certainly start your own ABI. I believe WASI started as a fork of CloudABI, perhaps that would make sense as a starting point for a new one too, or perhaps you could start as a fork of WASI.

The WASI group seems reasonable and open to new ideas too, so if you're able to demonstrate ways to do things better, I'm sure they'd be willing to consider them or adapt them to WASI (though some other good ideas may be fundamentally at odds with the design principles of WASI, so those are probably best done in a separate ABI).

I think it's still too early to worry about fragmentation that much; we'd be better off, as a community, if we all focused our efforts on a limited number of ABIs instead of everyone making their own, but I think it's still early days for Wasm and there are lots of interesting ideas out there to try still.


Thanks for your work on that @brain0, by the way!

@Hywan Hywan added the 📦 lib-wasi About wasmer-wasi label Jul 15, 2021
@Amanieu Amanieu added priority-low Low priority issue priority-medium Medium priority issue and removed priority-low Low priority issue labels Oct 20, 2021
@stale
Copy link

stale bot commented Oct 20, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the 🏚 stale Inactive issues or PR label Oct 20, 2022
@stale
Copy link

stale bot commented Nov 19, 2022

Feel free to reopen the issue if it has been closed by mistake.

@stale stale bot closed this as completed Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎉 enhancement New feature! 📦 lib-wasi About wasmer-wasi priority-medium Medium priority issue 🏚 stale Inactive issues or PR
Projects
None yet
Development

No branches or pull requests

5 participants