Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for multiple readers via access handles #34

Open
alshamma opened this issue Jun 21, 2022 · 36 comments
Open

Request for multiple readers via access handles #34

alshamma opened this issue Jun 21, 2022 · 36 comments

Comments

@alshamma
Copy link

Our web app downloads large application specific static data files. They range in the 10kb to several hundred kb size. We use a variety of approaches to cache this data. We would prefer to store them in OPFS, and, to be able to share them between multiple tabs of our app. For example, if the user opens two tabs to our app, we want to have a single copy of the data. We would like to download the data once, and then be able to read using the OPFS access handles from multiple tabs. We prefer access handles for their good performance.

@jesup
Copy link
Contributor

jesup commented Jun 22, 2022

(I believe I know the answer to this): Do you need the ability to seek within these files? I presume performance reading (from WASM) is important here.

Adding an 'accessMode: read' to createSyncAccessHandle() would seem straightforward, and can leverage existing accessmode logic used currently for SyncAccessHandle and getWritableFileStream

@alshamma
Copy link
Author

Yes, we do need the ability to seek within these files. Your assumption about the importance of the performance of the read operation is correct. :-)

@a-sully
Copy link
Collaborator

a-sully commented Jun 22, 2022

Is a read-only handle sufficient, or do you need a handle you can write to and read from (from multiple places) at the same time? To me, this becomes a more broad question around how we should handle file locking (#18).

#19 lays out a more general proposal that gives developers more control over locking behavior, among other things.

Taking this proposal a bit further makes me wonder whether we should allow things like multiple writers to the same file. An argument could be made that since this is intended to be a high-performance API, the browser should get out of the way and let the developer and underlying OS handle keeping file data consistent.

@alshamma
Copy link
Author

We don't need to do read/write simultaneously. We typically write once and read many.

@tlively
Copy link

tlively commented Aug 12, 2022

Taking this proposal a bit further makes me wonder whether we should allow things like multiple writers to the same file. An argument could be made that since this is intended to be a high-performance API, the browser should get out of the way and let the developer and underlying OS handle keeping file data consistent.

From Emscripten's point of view, allowing arbitrarily many readers and writers would be ideal. We aim to emulate a normal POSIX environment to make bringing applications to the Web as easy as possible, and any differences imposed by Web APIs become problems our users (or we) have to work around. Our user's applications already handle the challenges of simultaneous file access on native platforms, so the Web platform trying to protect them from those challenges causes more problems than it solves.

@szewai
Copy link

szewai commented Sep 12, 2022

Taking this proposal a bit further makes me wonder whether we should allow things like multiple writers to the same file. An argument could be made that since this is intended to be a high-performance API, the browser should get out of the way and let the developer and underlying OS handle keeping file data consistent.

From Emscripten's point of view, allowing arbitrarily many readers and writers would be ideal. We aim to emulate a normal POSIX environment to make bringing applications to the Web as easy as possible, and any differences imposed by Web APIs become problems our users (or we) have to work around. Our user's applications already handle the challenges of simultaneous file access on native platforms, so the Web platform trying to protect them from those challenges causes more problems than it solves.

The web API needs to be interoperable, having the same behavior across different file systems or OS, so we probably cannot make this API the mirror of POSIX file system API, or any file system API. (We could provide interface that client needs, with clear definition of expected behavior.)
Based on that, I think allowing multiple writers would be a more difficult question than allowing multiple readers.

@tlively
Copy link

tlively commented Sep 12, 2022

Even if the browser had to do something like serialize all accesses to a file to ensure that no inconsistent state (i.e. partially applied write) could be observed, that would still be much better than the status quo of allowing only a single writer.

@a-sully
Copy link
Collaborator

a-sully commented Sep 12, 2022

I think allowing multiple writers would be a more difficult question than allowing multiple readers.

Yes, I agree. We can probably get away with just relaxing the exclusive locking requirement (i.e. making it possible to create a SyncAccessHandle with a read-only lock) to support multiple readers, but supporting multiple writers without a way to coordinate writes could get hairy.

As for whether we should support multiple writers in the first place... The current restriction of having one writer is not a problem for applications using this API, since single-paged applications can route all write requests through the same dedicated worker. But this restriction imposes architectural constraints that are significant hurdles to building libraries on top of this API, since all writes have to be proxied through the same worker, even when there are multiple tabs open for the same origin.

Adding support for byte-range file locking would solve this problem, albeit while introducing quite a bit of complexity. It's unclear to me at this point whether this is something we want to pursue.

@tlively
Copy link

tlively commented Sep 12, 2022

Is it possible for multiple instances of the same application open in multiple tabs to route accesses through a shared worker? If so, that's news to me and would make things significantly simpler. If not, the single writer limit is still painful at the application level, since having multiple tabs open to the same application is extremely common.

@a-sully
Copy link
Collaborator

a-sully commented Sep 13, 2022

Is it possible for multiple instances of the same application open in multiple tabs to route accesses through a shared worker?

Shared workers exist, but my understanding is they're not a context where we'd want to call sync APIs. You can postMessage between tabs of the same origin (a FileSystemHandle is transferrable in this way, but a SyncAccessHandle is not).

If not, the single writer limit is still painful at the application level, since having multiple tabs open to the same application is extremely common.

Sorry, I should have been more clear that I was referring to just a single-page application. You are correct that this is an issue for any use case with multiple tabs

@szewai
Copy link

szewai commented Sep 13, 2022

I think allowing multiple writers would be a more difficult question than allowing multiple readers.

Yes, I agree. We can probably get away with just relaxing the exclusive locking requirement (i.e. making it possible to create a SyncAccessHandle with a read-only lock) to support multiple readers, but supporting multiple writers without a way to coordinate writes could get hairy.

Right, we need to figure out how to coordinate the sync writes from multiple writers first, which could be in different threads or different processes.

As for whether we should support multiple writers in the first place... The current restriction of having one writer is not a problem for applications using this API, since single-paged applications can route all write requests through the same dedicated worker. But this restriction imposes architectural constraints that are significant hurdles to building libraries on top of this API, since all writes have to be proxied through the same worker, even when there are multiple tabs open for the same origin.

Adding support for byte-range file locking would solve this problem, albeit while introducing quite a bit of complexity. It's unclear to me at this point whether this is something we want to pursue.

I'd prefer making analysis and adding support based on realworld use cases, like the Adobe example for multiple readers.
Otherwise we might create something that's theoretically good but not performant enough (due to complicated locking), not user-friendly, or not meets developer's need.

@a-sully
Copy link
Collaborator

a-sully commented Sep 13, 2022

I'd prefer making analysis and adding support based on realworld use cases, like the Adobe example for multiple readers. Otherwise we might create something that's theoretically good but not performant enough (due to complicated locking), not user-friendly, or not meets developer's need.

I'm glad you asked! :)

Developers want relational databases. A goal of the Access Handles project is to allow developers to bring their own database to the web. As mentioned in this recent article about WebSQL deprecation by @tomayac, Chrome is working with the SQLite team to bring SQLite to the web over WASM. In the shorter term this will allow Chromium-based browsers to finally remove WebSQL, but the vision (or my vision, at least) is that Access Handles are the primitive enabling an ecosystem of WASM-based database libraries to become the way sites store data on the web. Fulfilling this vision requires those libraries to support reading and writing data from multiple tabs.

This API becomes significantly less useful if it's slowed down by bloat. Something like byte-range file locking is not worthwhile if it would degrade performance.

@alshamma
Copy link
Author

I totally agree with:

Something like byte-range file locking is not worthwhile if it would degrade performance.

From the Adobe perspective, the multiple-readers is the priority as it would allow us to share data and caches between tabs. In particular, this allows us to avoid downloads of large data files like documents and fonts when shared between tabs.

While we see the value in multiple-writers for libs like SQLite, it is not a priority.

I prefer we focus on performant APIs.

@jesup
Copy link
Contributor

jesup commented Sep 15, 2022

The discussion at the Breakout session on OPFS seemed to end with consensus on allowing an optional lock-type parameter, which defaults to exclusive, but allows taking a shared read lock, or no lock at all. This is in reference to SyncAccessHandles.
Should we also apply this to WritableFileStream (with a default of Shared)?

@a-sully
Copy link
Collaborator

a-sully commented Sep 15, 2022

The discussion at the Breakout session on OPFS seemed to end with consensus on allowing an optional lock-type parameter, which defaults to exclusive, but allows taking a shared read lock, or no lock at all. This is in reference to SyncAccessHandles.

These three categories of locks SGTM (as discussed in the meeting). Now to bikeshed the API surface :)

I think we need to be clear that the "no lock" mode is AT YOUR OWN RISK. I'd prefer something like unsafe or use-at-your-own-risk which conveys the risks more explicitly.

enum FileSystem{SyncAccessHandle?}LockMode {"exclusive",  "shared", "unsafe"};

Should we also apply this to WritableFileStream (with a default of Shared)?

I'm not sure whether the locking mechanisms for WritableFileStream should be the same as for SyncAccessHandles. For example, a shared read lock (and any read lock) doesn't make sense in reference to WritableFileStream. And perhaps more importantly, it's not possible for the browser to enforce an exclusive locking requirement for files outside of the OPFS, which are accessible by other applications on the machine (or even from other machines, if the file lives on a shared file system. See WICG/file-system-access#286).

Another question here: I think there's an argument to be made that locking should be exclusive across concepts. For example, if a site has a number of SyncAccessHandles in the no-restrictions ("unsafe") mode, should we still block a move() operation? Or be able to open a new WritableFileStream? It doesn't seem great to be able to overwrite a file while it has open file descriptors.

@jesup
Copy link
Contributor

jesup commented Sep 15, 2022

The spec currently requires a shared lock for WritableFileStream (which blocks a SyncAccessHandle while we have a WritableFileStream)

I'm not going to address interactions with the non-OPFS File System API.

Being able to overwrite a file that has open AcessHandles would in fact be the point of "unsafe" access; the requirement to deal with any collisions/races falls upon the application, as we discussed yesterday.

@alshamma
Copy link
Author

As an application developer, I prefer protective behaviors from the file system. I want to have the operation rejected and to receive an error code if I am performing something that will result in data loss or corruption. Maybe I'd like an override mechanism for some scenario, but nothing is coming to mind as to a scenario.

I think a file should not be movable if it is open for read or write. This is based on my experience over the decades of writing code for Windows, Mac and Linux.

@a-sully
Copy link
Collaborator

a-sully commented Sep 15, 2022

I think there's an argument to be made that locking should be exclusive across concepts.

The spec currently requires a shared lock for WritableFileStream (which blocks a SyncAccessHandle while we have a WritableFileStream)

To explain this another way, I'm wondering whether "shared" and "exclusive" locks is the wrong model here, but if all locks should be exclusive with respect to other concepts (or "operations"). So opening any SyncAccessHandle would exclude performing any other sort of operation that could modify the file (such as opening a WritableFileStream or calling move()). But you could open more SyncAccessHandles, assuming you opened it using the "shared" or "unsafe" option.

Realistically, any operation which modifies the file (remove(), move(), removeEntry()) should grab an exclusive lock. I see the fact that removeEntry() grabs a shared lock as a bug; unfortunate behavior from before locking was a concept in the API. And I don't think there's ever a case where you'd want a WritableFileStream and SyncAccessHandle to the same file (which is not currently allowed anyways).

This also makes specifying locking literally exponentially easier, since we won't to specify the expected behavior between every possible locking state for every possible concept; only within concepts (i.e. when there are multiple WritableFileStreams or SyncAccessHandles)

Thoughts?

@szewai
Copy link

szewai commented Sep 16, 2022

I think there's an argument to be made that locking should be exclusive across concepts.

The spec currently requires a shared lock for WritableFileStream (which blocks a SyncAccessHandle while we have a WritableFileStream)

To explain this another way, I'm wondering whether "shared" and "exclusive" locks is the wrong model here, but if all locks should be exclusive with respect to other concepts (or "operations"). So opening any SyncAccessHandle would exclude performing any other sort of operation that could modify the file (such as opening a WritableFileStream or calling move()). But you could open more SyncAccessHandles, assuming you opened it using the "shared" or "unsafe" option.

Realistically, any operation which modifies the file (remove(), move(), removeEntry()) should grab an exclusive lock. I see the fact that removeEntry() grabs a shared lock as a bug; unfortunate behavior from before locking was a concept in the API. And I don't think there's ever a case where you'd want a WritableFileStream and SyncAccessHandle to the same file (which is not currently allowed anyways).

This also makes specifying locking literally exponentially easier, since we won't to specify the expected behavior between every possible locking state for every possible concept; only within concepts (i.e. when there are multiple WritableFileStreams or SyncAccessHandles)

Thoughts?

I like the idea of defining the behavior for each operation and not using lock in interface. If we are not going to offer lock API (e.g. let client acquire or release lock explicitly or check locking state), I think "readonly" and "readwrite" might be more straightforward than "exclusive" and "shared". (I am not quite sure about "unsafe" due to the completely undefined behavior and uncertain benefits).

To describe the concurrency model which allows multiple readers in OPFS, I think it might be something like:

  1. all operations are classified as read or write
  2. when there is no operation, read or write operation can start
  3. when there is an ongoing read operation, read operation can start but write operation should wait (or return error)
  4. when there is an ongoing write operation, all operations need to wait (or return error)
  5. an open readonly SyncAccessHandle is viewed as ongoing read operation, an open readwrite SyncAccessHandle is viewed as ongoing write operation

@jesup
Copy link
Contributor

jesup commented Sep 20, 2022

WritableFileStream right now takes a shared lock (which also allows for all sorts of oddities depending on who closes it last). It could also take a lock-type option for an exclusive lock (and even an unsafe mode open).
(What is the reason for the shared-mode access for WritableFileStream? Was it simply a by-product of the initial implementation - no one blocked it?)

@a-sully
Copy link
Collaborator

a-sully commented Oct 11, 2022

WritableFileStream right now takes a shared lock (which also allows for all sorts of oddities depending on who closes it last). It could also take a lock-type option for an exclusive lock (and even an unsafe mode open). (What is the reason for the shared-mode access for WritableFileStream? Was it simply a by-product of the initial implementation - no one blocked it?)

Before SyncAccessHandles, the API had no concept of locking. WPTs asserted that you could create multiple WritableFileStreams, and this seemed reasonable given the triviality of creating multiple swap files. Arguably this makes less sense now, especially with the proposed autoClose flag - which writer wins if that flag is specified on multiple writers? "shared" and "exclusive" locks came about because we needed to integrate SyncAccessHandles into the existing API, but (at the time) we didn't want to hand out multiple SyncAccessHandles to the same file and we didn't want a WritableFileStream to be able to overwrite an open SyncAccessHandle... since you could create multiple WritableFileStreams, we gave them shared locks, which meant you couldn't create a SyncAccessHandle if a WritableFileStream was open to the file.

This decision has had some strange side effects. For one, it makes supporting multiple read-only SyncAccessHandles tricky. But also, what about modifying operations, such as move(), remove(), and removeEntry()? Well, old WPTs had asserted that we could call removeEntry() even with an open WFS. So... I guess that needs a shared lock, too? It's an ugly situation that doesn't scale well, hence why I'm proposing a change to how we do locking

To describe the concurrency model which allows multiple readers in OPFS, I think it might be something like:

  1. all operations are classified as read or write
  2. when there is no operation, read or write operation can start
  3. when there is an ongoing read operation, read operation can start but write operation should wait (or return error)
  4. when there is an ongoing write operation, all operations need to wait (or return error)
  5. an open readonly SyncAccessHandle is viewed as ongoing read operation, an open readwrite SyncAccessHandle is viewed as ongoing write operation

Thank you for the proposal! I agree that this pattern is more straightforward and maybe easier for users to wrap their heads around, but I'm skeptical for a few reasons...

  • This will break a lot of existing code
    • WPTs assert that createWritable() can be called multiple times and will create multiple swap files, accordingly. While I'm fully supportive of enabling developers to create exclusive WritableFileSteams (and we may want to require a stream to be created exclusively if autoClose is to be specified, for example), we don't gain much by requiring all streams be created exclusively
    • In this new paradigm,getFile() would presumably require a "readonly" lock. Right now, this method can be called without any restrictions. My understanding is that this is currently used by Emscripten as a workaround for our lack of multiple-reader support for SyncAccessHandles (@tlively may be able to confirm). I expect this is not the only usage that would break if we add locking to this method
  • Unlike SyncAccessHandles, WritableFileStreams, or file-modifying methods such as move(), getFile does not have a clear open and close lifetime
    • getFile() returns a readable File
    • When should the "readonly" lock be released? We'd have to track when the File goes out of scope
    • Holding the lock only for the duration of the method (as we'll presumably do for e.g. move()) would leave a readable object hanging around while we perform writes
    • We could consider not requiring a lock for this method... at which point we've scrambled whatever mental model the user had for "readonly" and "readwrite" locks
  • This closes the door to ever supporting multiple writers for SyncAccessHandles

I personally prefer the model where locks are exclusive across primitives that could modify the file (I wish I'd used the word "primitives" instead of "concepts" above). @jesup do you have any opinions?

@jesup
Copy link
Contributor

jesup commented Oct 13, 2022

We've updated (locally at least) the WPTs to not allow remove/etc if there's a WFS - we require an exclusive lock to remove or move a file.

Do current applications depend on multiple WFSs to the same file?

Note that if we allow the "Unsafe" locks for SyncAccessHandles (or WFSs), as discussed at TPAC, we'll need to still deal with how that interacts with move/removeEntry... We could still take an "Unsafe" lock, and specify that move/removeEntry require that there are no locks on the file (as opposed to stating that they take an exclusive lock). We also would have to specify the interaction between Unsafe locks and Exclusive/Shared locks - can you take an Unsafe lock if there are existing Exclusive/shared locks? Can you take an Exclusive/Shared lock if there are Unsafe locks? The discussion at TPAC implied that Unsafe was the equivalent to not locking, which would mean that Exclusive/Shared and Unsafe wouldn't notice each other. (and that move/RemoveEntry would happily take an exclusive lock and move/remove).

I agree getFile() can cause interesting issues if the file is modified/moved/deleted while the blob exists. We can't reasonably lock, due to it being a GC'd object with no close(), unless we wrap it with an API with close().

@szewai
Copy link

szewai commented Oct 14, 2022

For getFile, maybe we can release read lock after getting the snapshot for the underlying file https://w3c.github.io/FileAPI/#file-section.

For multiple writer support, or if developers need finer-grained control in the future (e.g. implementing byte-range lock), it might be easier to introduce an "unsafe" mode on the file, not on the handle, where no lock is required for operations (including move). Then lock will only be a concept in "safe" mode (and might not be exposed as it is acquired/released as part of the operation).

@alshamma
Copy link
Author

I guess I'm puzzled about the use cases for multiple writers. The only case I can think of is if there is range locking, so that one context to lock a range and modify it (or append or truncate) while another context has locked a non-overlapping range.

@sgbeal
Copy link

sgbeal commented Oct 14, 2022

I guess I'm puzzled about the use cases for multiple writers. The only case I can think of is if there is range locking, so that one context to lock a range and modify it (or append or truncate) while another context has locked a non-overlapping range.

In sqlite3, for example, any number of clients may open a given db file in read/write mode, but that mode doesn't mean much until sqlite3 actually has to write to the file. When sqlite3 needs to lock, it locks the whole file, but only for the time required for the given operation (which may be just a few small I/O ops or "forever," depending on what the client is doing). Its public API which clients use to plug in new storage backends (as we've just done with OPFS) does not model byte-range locking.

In the case of sqlite3's OPFS support, we open the FileSystemFileHandle at the earliest opportunity, use createSyncAccessHandle() when the lower-level sqlite3 internals tells us that it needs a lock, and close() that sync handle when the internals tell us to unlock. For the code-curious, search this file for "createSyncAccessHandle" and "close".

@jesup
Copy link
Contributor

jesup commented Oct 15, 2022

The argument for multiple writers would be that it allows a simpler (and perhaps more efficient) emulation of POSIX file operations by e.g. emscripten/etc - POSIX doesn't disallow multiple opens. If emscripten emulates POSIX file locking, then operations like @sgbeal describes would simply work. Also any application which wants to implement it's own locking/range-locks could do so.

The argument against it would be that multiple writers is generally error-prone, and racy if done from multiple tabs (though with care that would be ok, much as it is in POSIX applications), and that most applications don't try to write from multiple handles.

Unlike native apps, however, it's easier in a browser for multiple instances of an app to be "running" at once (in separate tabs), which increases the changes of apps hitting open() (createSyncAccessHandle) failures, which they may not deal with well. If we have shared read-only mode (not just a shared lock, but a true read-only mode), that may reduce the likelihood of this sort of failure, though apps may open files in read/write mode by default in case they may need to write to it.

Note that POSIX/linux by default has no locking on file operations, and applications deal with that, and either use the advisory locking primitives or their own synchronization methods.

The primary argument comes down to flexibility & ease of posix emulation compared to blocking potentially error-prone usages. There are arguments for both.

@dslee414
Copy link
Collaborator

dslee414 commented Dec 2, 2022

Thanks for summarizing the pros and cons of multiple writers, @jesup. All valid points, and given that it is OPFS, an app/site should be able to manage error-prone scenarios pretty well. I can see that the flexibility here for POSIX-like file operations can be quite beneficial here. For example, in SQLite, being able to keep multiple files open via multiple writers could help the overhead caused by async open, once it bears the one-time cost of opening all SAHs in the beginning. (The recent changes for all sync methods within SyncAccessHandle has helped SQLite performance but opening SAH is still asynchronous.)

We also would have to specify the interaction between Unsafe locks and Exclusive/Shared locks - can you take an Unsafe lock if there are existing Exclusive/shared locks? Can you take an Exclusive/Shared lock if there are Unsafe locks?

These are great questions. I think having an unsafe sync access handle take “no lock” could cause unexpected data read by other sync access handles in safe mode. So, my inclination here to your questions above are, no to both.

Unfortunately, the current lock types {“exclusive”, “shared”} won’t be able to support some scenarios. For example, Writable with a shared lock and read-only sync access handle with a shared lock can coexist, which we probably don’t want. Another example is if we do not want read-only sync access handle and unsafe sync access handle to coexist, using shared locks for both won’t work; lock will need to know about the “mode”.

As @a-sully mentioned above, there are two dimensions here:

  1. Across different types of primitives (writable, sync access handle) and modifying operations (remove, move, etc)
    • Can a primitive A be opened while there is a lock on primitive B (different type)?
      • I think, no, regardless of primitive mode (i.e. unsafe, readonly…)
    • Are modifying operations allowed while there is an open writable or SAH?
      • no
  2. Within the same primitive
    • Can a primitive A be opened while there is another primitive of the same type A?
      • Only if it’s the same (shared) mode. There can be multiple unsafe modes, OR readonly modes, but NOT unsafe and readonly modes together.

Perhaps, we will need something like lock type and lock mode to handle (1) and (2). What about something like this?

// Before, as per https://fs.spec.whatwg.org/#file-entry-lock
lock.value {“open”, “taken-exclusive”, “taken-shared”}
lock.count (int)

// After
lock.type {“sync-access-handle”, “writable-file-stream”, “modifying-operation”, “ancestor”}
lock.mode {“exclusive”, “shared-read-only”, “shared-write”}
lock.count (int)

To acquire a lock of type A and mode of B, the file’s associated lock must:

  • have a count of 0 OR
  • have type A and mode B, where B is “shared-read-only” or “shared-write”

I propose we should also lock ancestor handles, in order to avoid removing a directory while there is an open writable or SAH. This lock needs to be a “shared-read-only” mode so that more than one child handle can be granted locks at a time.

So, putting it all together... Not all type/mode combinations would be valid. The valid ones could be:

{type = “sync-access-handle”, mode = “exclusive”}
{type = “sync-access-handle”, mode = “shared-read-only”}
{type = “sync-access-handle”, mode = “shared-write”}
{type = “writable-file-stream”, mode = “exclusive”} // when supported
{type = “writable-file-stream”, mode = “shared-write”}
{type = “modifying-operation”, mode = “exclusive”}
{type = “ancestor”, mode = “shared-read-only”}

On the user-facing side, the new SyncAccessHandleLockMode (and possibly WritableFileStreamLockMode) can map to the lock modes in the spec. That way, users do not need to worry about the “lock” concept; they will just need to specify what behavior they want.

// New IDLs
SyncAccessHandleLockMode { 
“readwrite”,  // Maps to “exclusive” lock mode
“read-only”,  // Maps to “shared-read-only” lock mode
“readwrite-unsafe”,  // Maps to “shared-write” lock mode
}
// If we decide to support mode for WritableFileStream…
WritableFileStreamLockMode { 
“exclusive”,  // Maps to “exclusive” lock mode
“multiple”,  // Maps to “shared-write” lock mode
}

In summary:

  • readwrite:
    • the default value (current behavior)
    • Cannot be opened when there is another Sync Access Handle in any mode
    • Cannot be opened while there is modifying operation in progress (and vice versa)
    • all operations (read(), write(), etc) on access handle are allowed
    • takes a lock of type “sync-access-handle” and mode “exclusive”
  • read-only:
    • Can be opened with other read-only; cannot with readwrite and readwrite-unsafe
    • Cannot be opened while there is modifying operation in progress (and vice versa)
    • Supports multiple readers
    • write(), truncate() and flush() throws NoModificationAllowedError
    • takes a lock of type “sync-access-handle” and mode “shared-read-only”
  • readwrite-unsafe (name TBD):
    • Can be opened with other readwrite-unsafe; cannot with readwrite and read-only
    • Cannot be opened while there is modifying operation in progress (and vice versa)
    • Supports multiple writers
    • all operations (read(), write(), etc) on access handle are allowed
    • takes a lock of type “sync-access-handle” and mode “shared-write”

Thoughts?

@tlively
Copy link

tlively commented Dec 2, 2022

Exclusively locking to a single type of file access primitive then having various locking schemes available within each primitive makes a lot of sense to me.

I propose we should also lock ancestor handles, in order to avoid removing a directory while there is an open writable or SAH. This lock needs to be a “shared-read-only” mode so that more than one child handle can be granted locks at a time.

I'm not sure I understood this part correctly, but I think you mean that the ancestor directories up to the root above any file with an open SAH should be locked for shared reading. I don't think that would be a good idea, since it would unnecessarily inhibit directory operations like creating a new sibling or deleting a sibling of an open file. Preventing open files from being deleted seems reasonable, but ideally that would be implemented as a check in the file deleting procedure so it doesn't affect any other operations.

Note that POSIX allows open files to be unlinked and further allows reading and writing to unlinked files, so in a perfect world it would be great if the file system APIs allowed that as well, but I understand that this is probably infeasible. FWIW, a very large user ran into a bug related to this difference in behavior, but I think they were able to work around it.

@a-sully
Copy link
Collaborator

a-sully commented Dec 2, 2022

I don't think that would be a good idea, since it would unnecessarily inhibit directory operations like creating a new sibling or deleting a sibling of an open file

Hmm yeah I can see how that that would confusing. API locks are specified to correspond to exactly one file entry. The intention here is that the "shared-read-only" lock on the ancestors would only prevent the directory entry itself from being modified (so no renaming/moving/deleting the directory), but modifications within the directory, such as creating a new sibling or deleting a sibling of an open file, would be fine since those operations will take a lock corresponding to a different entry.

Preventing open files from being deleted seems reasonable, but ideally that would be implemented as a check in the file deleting procedure so it doesn't affect any other operations

It also seems reasonable to specify this as part of the deletion (and possibly move - see below) procedure. There are a few reasons why an explicit ancestor lock sounds appealing to me:

  1. I agree that directories which have an open WritableFileStream or SyncAccessHandle should not be removable. This operation is not supported on all operating systems (Windows, for example) and seems to degrade the meaning of a "locked" file in the first place (i.e. an unrelated operation can still modify the file)

  2. I'm on the fence as to whether we should allow moving parent directories which have an open WritableFileStream or SyncAccessHandle. This gets back to What is a FileSystemHandle? #59, of course, and I understand that I'm saying this as an engineer who works on Chromium's path-based (for now - we're exploring our options!) implementation. My concern is that allowing these moves is fundamentally reliant on implementation details. Specifically, that a file can continue to be read/written as its path changes. This implies that:

  • the browser's implementation of the OPFS maps an entry to a reference that is abstracted from its path (i.e. the "file system" is implemented as a mapping in some database, but the files themselves are in a flat structure on disk) OR
  • an open file descriptor remains valid even if its path has changed

Outside of the OPFS, we do not have the luxury of mapping files ourselves. Currently we don't support SyncAccessHandles outside of the OPFS either, but presumably this could be an issue for a future async alternative (#41). Our ability to support move() outside of the OPFS would be reliant on the latter point being true, which simply may not be the case on all file systems. All that aside, it seems brittle for a cross-platform web API to rely on an implementation quirk like this?

  1. If these checks apply for more than just removal (i.e. if the directory entry is never allowed to change while there's a lock in a contained entry) then the directory's entry is effectively locked and I figure we might as well make that explicit.

Curious to hear other thoughts here!

@dslee414
Copy link
Collaborator

Any further thoughts, @jesup @szewai @alshamma?

@alshamma
Copy link
Author

I have not been able to follow the full conversation. Our codebase runs on Windows, Mac, Linux, iOS as well as WebAssembly. The file system and file operation abstraction layers in our code attempt to smooth over the differences. Most of our code uses the posix apis. We frequently use the boost::filesystem apis. We avoid multiple writers in our codebase. Given our large code base, we prefer that the default behavior be mapped as close as possible to the posix apis and the underlying operating system.

@jesup
Copy link
Contributor

jesup commented Jan 17, 2023

I think we generally should pick a target - IMO posix APIs. Implementing posix on top of anything is pretty well defined.
However, we need to choose a clean, clear target and semantics; right now it's really messy and unclearly defined. This might mean specifying a lowest-common-denominator or most-restrictive-denominator to guarantee implementability, if there are "foo can't implement "

@nathanmemmott
Copy link
Contributor

There is now a dev trial for new File System Access locking modes. See the announcement here. This is targeted for milestone 122.

@rhashimoto
Copy link

rhashimoto commented Oct 25, 2023

There is now a dev trial for new File System Access locking modes.

I have been using this for three weeks, I believe since the day it reached Chrome Canary. I'm only using the "readwrite-unsafe" mode; it's exactly what I wanted (along with JSPI) for building SQLite virtual file systems. Supporting access handles that can always both read and write is a critical feature for performant SQLite with multiple connections.

Using the new mode I have built a prototype SQLite VFS that supports write-ahead logging, which should provide better performance and concurrency than currently possible with OPFS. It is described here and an online demo is here.

I have written more than 10 SQLite VFS implementations for the web platform, using both OPFS and IndexedDB. FWIW I am highly in favor of this proposal, particularly "readwrite-unsafe". I appreciate the work that has gone into getting it this far!

@tlively
Copy link

tlively commented Oct 25, 2023

This is great news! What is the best way to feature-detect whether these options are supported? It would be great to experiment with the new modes in Emscripten, but we'll need to be able to gracefully fall back to the current behavior if the new modes are not supported.

@nathanmemmott
Copy link
Contributor

Using the new mode I have built a prototype SQLite VFS that supports write-ahead logging, which should provide better performance and concurrency than currently possible with OPFS. It is described rhashimoto/wa-sqlite#116 (comment) and an online demo is here.

This looks great! Keep us up to date with any feedback or questions you have.

This is great news! What is the best way to feature-detect whether these options are supported? It would be great to experiment with the new modes in Emscripten, but we'll need to be able to gracefully fall back to the current behavior if the new modes are not supported.

The best way would be to check for the mode property that has been added to FileSystemSyncAccessHandle and WFSLockingSchemeEnabled:

const FSALockingModesEnabled =
  FileSystemSyncAccessHandle.prototype.hasOwnProperty('mode');
// OR
const FSALockingModesEnabled =
  FileSystemWritableFileStream.prototype.hasOwnProperty('mode');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

9 participants