Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give FileSystemHandles an associated path #96

Merged
merged 12 commits into from
Mar 22, 2023
Merged

Conversation

a-sully
Copy link
Collaborator

@a-sully a-sully commented Feb 24, 2023

Fixes #59

A much more thorough (than #30) attempt at expressing that FileSystemHandles map to file paths. At a high level, it proposes that:

  • A FileSystemHandle has an associated "file system locator"
  • A "file system locator" contains, among other things, a "file system path"
  • A "file system entry" can be located given a "file system locator"
  • You can get a "file system locator" from a "file system entry"

See #59 for the implications of this in practice.

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

@a-sully
Copy link
Collaborator Author

a-sully commented Feb 24, 2023

I expect the build to fail for now, since [=FileSystemHandle/entry=] has been un-defined. I plan to update all the uses once we decide on how a FileSystemHandle should relate to an entry

index.bs Outdated Show resolved Hide resolved
@@ -193,9 +204,18 @@ interface FileSystemHandle {
};
</xmp>

A {{FileSystemHandle}} object represents a [=/file system entry=]. Each {{FileSystemHandle}} object is associated
with an <dfn for=FileSystemHandle export>entry</dfn> (a [=/file system entry=]). Multiple separate objects implementing
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-defining [=FileSystemHandle/entry=] breaks most of this spec (and https://github.com/WICG/file-system-access), and a FileSystemHandle does conceptually have an associated entry... even if getting that entry requires looking it up using path.

Is it possible to have an "associated" member (i.e. [=FileSystemHandle/entry=]) that's retrieved using an algorithm?

It would be something like:

Each {{FileSystemHandle}} object is associated with a
<dfn for=FileSystemHandle export>path</dfn> (a [=/file system path=])
which maps to an associated <dfn for=FileSystemHandle export>entry</dfn> (a [=/file system entry=])
using an [=implementation-defined=] mapping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't part of the problem that it doesn't always map to an entry? I think you'd have to change all call sites anyway to take that into account and given that it seems better to have a more explicit algorithm that takes you from one to the other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I was thinking. It seems to me like the options are:

  • a path can always map to an entry, which may "exist" or not (as currently sketched out in the getFile() method)
  • the locate method returns either a valid entry, or nothing. And then perhaps we can add a [=FileSystemHandle/exists=] algorithm for simplicity. The latter seems more intuitive to me. I can test this out in the next patch...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the locate method returns either a valid entry, or nothing

This is sketched out in the latest patch!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comment above on tying access checks to paths, the current chrome implementation (at least last I looked at it, which admittedly is a while ago, so maybe that has been changed already?) does not guarantee that two handles representing the same path behave the same in all aspects (since for handles acquired via DirectoryHandle.getFileHandle/getDirectoryHandle the access checks are done with the path of the original directory, rather than the path of the current handle).

I'm not saying that shouldn't be changed, but it's certainly something to keep in mind. Perhaps tying access checks to the FileSystemHandle rather than the entry would be a way around that, if we want to keep the same behavior? Although you'd have to figure out how to serialize that properly as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my confusion, but why would we need to serialize access checks if we were to move them from the "entry" to the FileSystemHandle?

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated
@@ -59,6 +60,8 @@ different storage mechanism with a different API for such files. The entry point

A <dfn export id="entry">file system entry</dfn> is either a [=file entry=] or a [=directory entry=].

<!-- TODO: Consider tying access checks to a path, to support handles which don't map to an entry -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, at least originally we intentionally did not tie access checks to paths, to provide more flexibility for implementations (and I don't think even today our implementation actually ties access checks to path either?). One case where this makes a difference is:

  • get a read-only directory handle via a directory picker
  • get a writable file handle, that happens to be in the above directory
  • now get a file handle to the same file by calling getFileHandle on the directory handle

I'm pretty sure that in at least our implementation the latter file handle will still be read-only. (i.e. while we sort of tie access checks to paths, they're not tied to the path of the entry a handle refers to, rather they are tied to the path of the file or directory that a user at some point picked).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that, at least in Chrome's current implementation, the file handle you get from calling getFileHandle() on the directory handle is read-only, though it's debatable whether that's a feature or a bug. I can see this leading to confusion on the part of the website author and user. Regardless, that's a Chrome implementation detail...

The reason I mention this is because if we assert that an entry must exist on disk, then we can't use it to lock/check access to entries that don't exist, such as when locking the target file when renaming a file.

... thinking about this a bit more, perhaps the move() case isn't actually an issue if we specify that all file system operations to the same queue (#95), since that would guarantee (at least within the OPFS) that the target file isn't created and locked while the move() operation is in progress

Not sure what to do about access checks, though, if we assert that an "entry" exists on disk

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@@ -193,9 +204,18 @@ interface FileSystemHandle {
};
</xmp>

A {{FileSystemHandle}} object represents a [=/file system entry=]. Each {{FileSystemHandle}} object is associated
with an <dfn for=FileSystemHandle export>entry</dfn> (a [=/file system entry=]). Multiple separate objects implementing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comment above on tying access checks to paths, the current chrome implementation (at least last I looked at it, which admittedly is a while ago, so maybe that has been changed already?) does not guarantee that two handles representing the same path behave the same in all aspects (since for handles acquired via DirectoryHandle.getFileHandle/getDirectoryHandle the access checks are done with the path of the original directory, rather than the path of the current handle).

I'm not saying that shouldn't be changed, but it's certainly something to keep in mind. Perhaps tying access checks to the FileSystemHandle rather than the entry would be a way around that, if we want to keep the same behavior? Although you'd have to figure out how to serialize that properly as well.

index.bs Outdated Show resolved Hide resolved
@a-sully
Copy link
Collaborator Author

a-sully commented Mar 2, 2023

@annevk do you have any idea why the build is failing? Does it not like "getting the entry"?

@annevk
Copy link
Member

annevk commented Mar 2, 2023

I suspect it's failing because of "the the". Search for "typos" in https://resources.whatwg.org/build/deploy.sh.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@a-sully
Copy link
Collaborator Author

a-sully commented Mar 3, 2023

I suspect it's failing because of "the the". Search for "typos" in https://resources.whatwg.org/build/deploy.sh.

Ah, good catch. Thanks!

index.bs Outdated
exists at the path `data/drafts/example.txt` relative to the root directory of
the [=origin private file system=],
|locator|'s [=file system locator/kind=] has to be {{FileSystemHandleKind/"file"}},
|locator|'s [=file system locator/path=] might be « "`data`", "`drafts`", "`example.txt`" », and
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say "might" here because it's still "implementation defined" how a "locator" maps to an "entry", and most of the methods (with isSameEntry() and resolve() being the exceptions) require the "entry" to exist to do anything useful. For example, take the getFileHandle(name) method. It's basically:

  1. Given a "locator", locate entry (a dir entry)
  2. Given entry, find the child whose name is name
  3. Create a FileSystemFileHandle given child, which involves "getting the locator" of child

So nowhere in this flow do we ever specify appending name to this's locator's path to get child's locator's path

Relatedly, creating the root of an OPFS doesn't say anything about its locator's path

If we want to assert that the path above "has to be" /data/drafts/example.txt (in list form) then we could:

  • ensure that handles acquired from various API surfaces are given a defined path
    • set the path of the root of an OPFS to a list containing one empty string
    • outside of the OPFS (e.g. for a file picked from a file picker), let the path be "implementation defined"
  • create handles from existing handles by manipulating the existing handle's locator's path
    • add new versions of the "create a new FileSystem{File|Directory}Handle" algorithms that append to the path of the existing handle

I'm happy to make this change (either here or in a follow-up), but this is where getFileHandle() and getDirectoryHandle() get thorny, as @szewai pointed out #59 (comment)). Arguably, these methods could ignore the state on disk and just always return a handle whose locator has had its path appended to, without bothering to check what exists on disk....

... but this would be a breaking change, and a pretty substantial one where { create: true } is currently passed

For { create: false } with dir that exists at path "/testing":

await dir.remove();
// Before: throws a NotFoundError
// After: succeeds - handle's path is "/testing/example.txt"
let handle = await dir.getFileHandle('example.txt', { create: false });

This is a web-exposed change, but arguably not that bad, especially if we add something like #80. Something that threw an exception before would longer throw.

For { create: true } with dir that exists at path "/testing" if we deprecate the "create" option:

// Before: creates a new file at "/testing/example.txt"
// After: handle's path is "/testing/example.txt" - but no file is created
let handle = await dir.getFileHandle('example.txt', { create: true });

// At this point, any method which cares about what's on disk now fails

// Before: creates a SyncAccessHandle
// After: throws a NotFoundError
let accessHandle = await handle.createSyncAccessHandle();

// Before: gets the file's contents
// After: throws a NotFoundError
let file = await handle.getFile();

I worry that deprecating this option would break a lot of existing code. Something that works now would consistently throw

My tentative thoughts are that we should just leave this as "might" in this PR, then make the changes above to turn this into a "has to be", then look into making getFileHandle() and getDirectoryHandle() (perhaps by adding an additional flag? We can hash that out later :) )

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a step back here to ensure I understand.

  • getFileHandle() is exposed on FileSystemDirectoryHandle.
  • A FileSystemDirectoryHandle is represented by a locator and as such when a method is invoked upon it that might fail if the locator no longer has a corresponding entry.

Why could we not append the given name to the locator's path and then see if an entry exists there? I think we can keep the existing behavior here, even if it's not necessarily the most natural given the "new" underpinnings.


I think for the OPFS root the path would indeed have to be the empty list. I'm not sure about making it contain an empty string. That only seems sensible if we did directories that way and you decided we wouldn't do that.


For file pickers you'd want the root to be a unique identifier and the path to be a list containing the file name, I think. At least by default <input type=file> shouldn't start exposing more about the user's machine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a step back here to ensure I understand.

  • getFileHandle() is exposed on FileSystemDirectoryHandle.
  • A FileSystemDirectoryHandle is represented by a locator and as such when a method is invoked upon it that might fail if the locator no longer has a corresponding entry.

Correct 👍

Why could we not append the given name to the locator's path and then see if an entry exists there? I think we can keep the existing behavior here, even if it's not necessarily the most natural given the "new" underpinnings.

That's basically what I was proposing on the other thread, so yeah that SGTM. I just wanted to explain what I understood of the other perspectives that were expressed (@szewai please correct me if I've misunderstood anything!)

The only nitpick here is that if { create: true } currently fails rather than recursively create an entry. So we'd still need to check this's locator first before appending the given name, or change the behavior

I think for the OPFS root the path would indeed have to be the empty list. I'm not sure about making it contain an empty string. That only seems sensible if we did directories that way and you decided we wouldn't do that.

Yeah I agree that having a list with an empty string isn't great, but the empty string is the name of the directory. We'd have to carve out an exception for OPFS roots in this (which isn't that big of a deal):

If [=locating an entry=] given [=/file system locator=] |locator| returns a
[=/file system entry=] |entry|, then |entry|'s [=file system entry/name=] must
be the same [=string=] as the last [=list/item=] of
|locator|'s [=file system locator/path=].

For file pickers you'd want the root to be a unique identifier and the path to be a list containing the file name, I think. At least by default <input type=file> shouldn't start exposing more about the user's machine.

I agree that this would be ideal, but consider the following sequence:

  • show a file picker and select data/drafts/example.txt
  • show a directory picker and select data
  • resolve data/drafts/example.txt relative to data (this should be [ 'drafts', 'example.txt' ])

This won't resolve correctly if the "file system path" of the first file is just [ 'example.txt' ]

We could specify that the last item of the "file system path" should be the entry's name, but I think the rest of the path probably needs to be implementation-defined? And since the "file system path" isn't expected to be exposed to the end user, I don't think it should really matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only nitpick here is that if { create: true } currently fails rather than recursively create an entry. So we'd still need to check this's locator first before appending the given name, or change the behavior

Can you elaborate on this, I'm not sure I follow.

but consider the following sequence:

Well, you could resolve with null in this case. We don't have to tell the website more than the user told them. If you want to make it work you'd have to do away with the concept of a root I think (that would have to be part of the path), but I'm not necessarily convinced it needs to work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only nitpick here is that if { create: true } currently fails rather than recursively create an entry. So we'd still need to check this's locator first before appending the given name, or change the behavior

Can you elaborate on this, I'm not sure I follow.

// If dirHandle exists, success.txt is created within its directory
let handle = await dirHandle.getFileHandle('success.txt', { create: true });

await dirHandle.remove({ recursive: true });

// `dirHandle` no longer exists, so fail.txt is not created
let other = await dirHandle.getFileHandle('fail.txt', { create: true });

In this scenario, whether FileSystemDirectoryHandle.getFileHandle() succeeds is dependent on whether this exists. So in the algorithm for the method, we can't just append the given name to the locator's path and then see if an entry exists there. We have to check this's locator first (or support recursive creation)

but consider the following sequence:

Well, you could resolve with null in this case. We don't have to tell the website more than the user told them. If you want to make it work you'd have to do away with the concept of a root I think (that would have to be part of the path), but I'm not necessarily convinced it needs to work.

In this scenario, I don't see much benefit to not allowing the site to know that there's a relation chain between these handles? If the site has access to a parent directory, it has the means to find this out anyways. And this applies to more than just the resolve() method. The isSameEntry() method also just checks the locator (it can't check the entry, since the entry may not exist) and needs to return true in this case, for example

let handleFromDirPicker = await (await dataHandle.getDirectoryHandle('drafts')).getFileHandle('example.txt');`

// ???
await handleFromFilePicker.isSameEntry(handleFromDirPicker);

The more generic version of this question is whether the site should be aware if the same file is selected on separate file picker invocations. For example, if another file picker is shown and data/drafts/example.txt is once again selected. It's critical for our partners that isSameEntry() returns true in this case:

await handleFromFilePicker.isSameEntry(sameHandlePickedAgain);

Taking a step back... it seems like a bad idea to have different locators that point to the same entry anyways?

A nice perk of letting implementations define the "file system path" to contain components that are invisible to the user in the non-OPFS case is that we can operate directly on that path in resolve() and check for equality of "file system locators" by comparing its fields, rather than needing to resort to hand-wavy algorithms. The "implementation-defined" magic is limited to just the prepended path components for files outside of the OPFS

Yeah I agree that having a list with an empty string isn't great, but the empty string is the name of the directory. We'd have to carve out an exception for OPFS roots

By the way, any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back... it seems like a bad idea to have different locators that point to the same entry anyways?

I'm not sure. I'll try to get some clarity on this.

By the way, any thoughts on this?

It seems reasonable. We probably have to clarify somewhere that the name getter can return an invalid file name in this special case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to assert that the path above "has to be" /data/drafts/example.txt (in list form) then we could:

  • ensure that handles acquired from various API surfaces are given a defined path
    • set the path of the root of an OPFS to a list containing one empty string
    • outside of the OPFS (e.g. for a file picked from a file picker), let the path be "implementation defined"
  • create handles from existing handles by manipulating the existing handle's locator's path
    • add new versions of the "create a new FileSystem{File|Directory}Handle" algorithms that append to the path of the existing handle

This is all included in the latest patch, aside from creating handles outside of the OPFS (which will have to be done in a follow-up on https://github.com/WICG/file-system-access, for which I think I've exported all the terms I'll need... 🤞)

Also added a warning that the "file system path" shouldn't be exposed directly to the website process - it can only be implied by resolving the handle to a parent, as in this use case:

  • show a file picker and select data/drafts/example.txt
  • show a directory picker and select data
  • resolve data/drafts/example.txt relative to data (this should be [ 'drafts', 'example.txt' ])

index.bs Outdated Show resolved Hide resolved
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for tackling this! I will attempt to give this a more detailed read through next week. Perhaps meanwhile @jesup or @asutherland can have a look?

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty good to me. I found a couple of remaining nits. Can you please file the follow-ups so we can point to them?

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented Mar 16, 2023

Looking at this again I'm not sure I understand the setup around name.

  • Do entries need a name at all?
  • Why does getDirectory() need to set a name? Can it not just set path to "« the empty string »"? That would also allow us to simplify the getter steps as they would not have to special case OPFS anymore.

I pushed a cleanup commit that removes some of the wording around name that seemed to duplicate other requirements. And where it was used incorrectly as a type. I think we can go further, but I'm not entirely sure how to reword the various children iterators.

@a-sully
Copy link
Collaborator Author

a-sully commented Mar 17, 2023

I pushed a cleanup commit that removes some of the wording around name that seemed to duplicate other requirements. And where it was used incorrectly as a type

This looks good, assuming we agree below on how to handle the OPFS root. Thanks!

Looking at this again I'm not sure I understand the setup around name.

  • Do entries need a name at all?

I'm not entirely sure how to reword the various children iterators.

Are we sure we need to reword the various children iterators? It seems fine to use entries (as opposed to locators) for file system operations like reading the contents of an entry (i.e. the bytes of a file, or the children of a directory).

We need some way to identify the children within a directory. Iterating through a directory doesn't make much sense if the files within it don't have a name...

If the intention is that an "entry" represents an OS-level file/directory, then giving each entry a name (and contents, as mentioned above) is probably what we want. Alternatively, we could consider giving each entry a path... but that's not really how most file systems work. You can later get a file's path by walking up the directory chain, but the path isn't inherent to the file itself.

  • Why does getDirectory() need to set a name? Can it not just set path to "« the empty string »"? That would also allow us to simplify the getter steps as they would not have to special case OPFS anymore.

Yeah setting the path to "« the empty string »" is what I'd proposed earlier, so I'm on board with that :) #96 (comment)

The entry itself still needs a name though, as per above. And we need |map|["root"] to point to a directory entry that exists, not a locator that may not. Right?

@annevk
Copy link
Member

annevk commented Mar 17, 2023

Okay, that sounds reasonable. I guess that means we should keep the path[last]/name invariant in some form, probably as new bullet points in the lists I created. If you'll indulge me, what happens when you hold an entry named x in JS and someone does mv x y on the command line?

@a-sully
Copy link
Collaborator Author

a-sully commented Mar 17, 2023

Okay, that sounds reasonable. I guess that means we should keep the path[last]/name invariant in some form, probably as new bullet points in the lists I created.

SGTM. I'll add this in the next patch

If you'll indulge me, what happens when you hold an entry named x in JS and someone does mv x y on the command line?

So, as of this PR, JS (i.e. a FileSystemHandle) can't really "hold an entry", per se. If you have a FileSystemFileHandle whose locator pointed to x, then "locating an entry" with that locator will now return null. The entry itself would be renamed to y.

This isn't a huge deal, especially within the OPFS, since we don't expect users to be mving files within their browser's profile. It's a much bigger deal outside of the OPFS, where the files are visible to the user and to other applications on the machine. This is where WICG/file-system-access#72 comes into play - if a file is moved, the web site might receive a change event containing a FileSystemFileHandle pointing to where the file was moved to. Initially, I was planning to only bother specifying this for non-OPFS files, since that's where the most obvious use cases are... but given the developer demand for things like w3c/IndexedDB#51, I suspect it could provide some value to the OPFS, as well. Anyways, that's a discussion for another day. I'm hoping to put an explainer up soon :)

index.bs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

What is a FileSystemHandle?
3 participants