Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Fix using fs.Dir.Iterator twice #10404

Merged
merged 2 commits into from Jan 30, 2022
Merged

std: Fix using fs.Dir.Iterator twice #10404

merged 2 commits into from Jan 30, 2022

Conversation

ominitay
Copy link
Contributor

@ominitay ominitay commented Dec 24, 2021

This fixes the use of multiple Iterators in a row on a directory. Previously, on many platforms, using an Iterator on an already-iterated directory would give no entries.

Fixing this involved seeking to the beginning of the directory on the first call of next().

cc: @nektro @squeek502
fixes #10317

@kubkon
Copy link
Member

kubkon commented Jan 22, 2022

I take it that #10317 is still not fixed hence why the CI fails with this test added? If that's the case, I see two solutions for this PR to be in mergable state: 1) if you're up for it, you could propose a fix to #10317, or 2) disable this test for the time being an leave a TODO comment referencing the issue. Lemme know what you think!

@nektro
Copy link
Contributor

nektro commented Jan 22, 2022

i think // TODO https://github.com/ziglang/zig/issues/10317 with a skip is a good route, there's precedence

@ominitay
Copy link
Contributor Author

Yep, sorry, my PRs have fallen behind. I'll get this sorted in a moment to be merged.

@ominitay
Copy link
Contributor Author

I'd like to see the Windows CI pass before this is merged, to be sure :)

@ominitay
Copy link
Contributor Author

Ok, ready to merge! @kubkon

@ominitay ominitay changed the title Add test for using fs.Dir.Iterator twice std: Fix using fs.Dir.Iterator twice Jan 27, 2022
lib/std/fs.zig Outdated Show resolved Hide resolved
lib/std/fs.zig Outdated
@@ -369,6 +370,10 @@ pub const Dir = struct {
fn nextSolaris(self: *Self) !?Entry {
start_over: while (true) {
if (self.index >= self.end_index) {
if (self.first) {
std.os.lseek_SET(self.dir.fs, 0) catch unreachable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this cannot fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely can't on Linux, I assumed POSIX compliant systems would be the same, though I just pushed this to see how the tests run.

Copy link
Collaborator

@squeek502 squeek502 Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an explanatory comment on each catch unreachable would be nice, but the possible errors for each platform are (from lseek(2)):

  • EBADF when Dir.fd is invalid or wasn't opened with iteration. This catch unreachable would show up later anyway during the next call (see here and other similar lines), so it being triggered during the lseek_SET just moves where it happens.
  • EINVAL when "whence is not valid or the resulting file offset would be negative, or beyond the end of a seekable device.", but we're always using the known-to-be-valid SEEK_SET and the offset is always 0
  • EOVERFLOW when "The resulting file offset cannot be represented in an off_t", but we know 0 will be the resulting file offset
  • ENXIO when lseek is using SEEK_DATA or SEEK_HOLE (actually, this isn't even a possible error within lseek_SET since we know we're using SEEK_SET)
  • ESPIPE when "fd is associated with a pipe, socket, or FIFO.", but we know fd is a directory

WASI has an additional ENOTCAPABLE that I'm unsure about when/if it can be triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see. Curious to see if it applies to BSD/macOS too. I remember in the past I had some mixed experience with Dir iterators on macOS - the way the dir pointer works was completely different to how it works on Linux. But let's see if the tests pass. If so, then we're golden!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, worth looking further into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an explanatory comment on each catch unreachable would be nice, but the possible errors for each platform are (from lseek(2)):

  • EBADF when Dir.fd is invalid or wasn't opened with iteration. This catch unreachable would show up later anyway during the next call (see here and other similar lines), so it being triggered during the lseek_SET just moves where it happens.
  • EINVAL when "whence is not valid or the resulting file offset would be negative, or beyond the end of a seekable device.", but we're always using the known-to-be-valid SEEK_SET and the offset is always 0
  • EOVERFLOW when "The resulting file offset cannot be represented in an off_t", but we know 0 will be the resulting file offset
  • ENXIO when lseek is using SEEK_DATA or SEEK_HOLE (actually, this isn't even a possible error within lseek_SET since we know we're using SEEK_SET)
  • ESPIPE when "fd is associated with a pipe, socket, or FIFO.", but we know fd is a directory

WASI has an additional ENOTCAPABLE that I'm unsure about when/if it can be triggered.

ENOTCAPABLE would be triggered when we want iterate a dir tree we weren't given capabilities to do so.

lib/std/fs.zig Outdated Show resolved Hide resolved
lib/std/fs.zig Outdated Show resolved Hide resolved
@ominitay
Copy link
Contributor Author

Made changes, noticed I skipped over WASI and Darwin, so they should be sorted now :)

@ominitay
Copy link
Contributor Author

Am confused what went on with the Linux CI there. Seems to have failed on code which I didn't touch :/
My code seemed to pass fine though.

@ominitay
Copy link
Contributor Author

Will just push this to rerun CI

@nektro
Copy link
Contributor

nektro commented Jan 27, 2022

maybe rebase

@ominitay
Copy link
Contributor Author

maybe rebase

Already did that. This is so odd. Maybe I'll try reverting the changes I did to the WASI impl, see if that does anything...

@ominitay ominitay force-pushed the iterator branch 2 times, most recently from 7674ef6 to e3260c2 Compare January 27, 2022 21:40
@ominitay
Copy link
Contributor Author

Seems to have done it. Somehow using lseek_SET in WASI broke unrelated things ...

@squeek502
Copy link
Collaborator

squeek502 commented Jan 28, 2022

It looks like lseek_SET is unecessary for WASI, and the rewinding is already taken care of by using DIRCOOKIE_START when creating the Iterator here:

.cookie = os.wasi.DIRCOOKIE_START,

which then gets used when calling fd_readdir (note: not getdents like other platforms)

fd_readdir and dircookie docs:

The current implementation with no changes already passes the Dir.Iterator twice test case.

@ominitay
Copy link
Contributor Author

Fantastic, thank you for explaining that to me!

With that, this should be ready to merge.

lib/std/fs.zig Outdated Show resolved Hide resolved
lib/std/fs.zig Outdated Show resolved Hide resolved
This fixes the use of multiple `Iterator`s in a row on a directory.
Previously, on many platforms, using an `Iterator` on an
already-iterated directory would give no entries.

Fixing this involved seeking to the beginning of the directory on the
first call of `next()`.
Copy link
Member

@kubkon kubkon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @ominitay ! Thank you for seeing it through!

@kubkon kubkon merged commit dd7309b into ziglang:master Jan 30, 2022
@ominitay ominitay deleted the iterator branch January 30, 2022 14:13
andrewrk pushed a commit that referenced this pull request Feb 3, 2022
std: Fix using `fs.Dir.Iterator` twice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std.fs.Dir.walk only works once
5 participants