Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On macOS, under load, files appearing twice or disappearing when running ls, unless readdir is sorted #57

Open
sgielen opened this issue May 24, 2021 · 14 comments

Comments

@sgielen
Copy link

sgielen commented May 24, 2021

I encountered an issue on macOS where under load, a ls on a cgofuse-mounted filesystem would sometimes show files twice, or not show some files at all. My application is confirmed to call fill exactly as many times with the same filenames when the issue shows as when it does not. To confirm this is an issue outside of my application, I created a simple test case: https://gist.github.com/sgielen/b229ded71f5946fc242bfb6e75e08da6

The issue seems to worsen when the Readdir call takes longer, so my testcase Readdir implementation has a sleep between 0-200 ms in it. Furthermore, I've noticed that the lengths of filenames, and perhaps the hierarchy of the filesystem, must be reproduced in order to reproduce the bug.

I can reproduce with golang 1.16.3, cgofuse 1.5.0, macfuse 4.1.2, macOS Big Sur 11.2.3. I can't reproduce on Windows or on Linux.

I'm willing to investigate further, but could use your advice for next steps.

@sgielen
Copy link
Author

sgielen commented May 24, 2021

The issue seems to be resolved when I sort files returned from a readdir (via calls to fill):

diff --git a/client.go b/client.go
index b7cab35..bc220ae 100644
--- a/client.go
+++ b/client.go
@@ -53,6 +53,7 @@ import (
        "math/rand"
        "os"
        ospath "path"
+       "sort"
        "strings"
        "sync/atomic"
        "time"
@@ -198,7 +199,15 @@ func (f *Mount) Readdir(path string, fill func(name string, stat *fuse.Stat_t, o
        }
        time.Sleep(time.Millisecond * time.Duration(rand.Intn(200)))
 
-       for fn, f := range ret.Files {
+       var files []string
+       for fn, _ := range ret.Files {
+               files = append(files, fn)
+       }
+
+       sort.Strings(files)
+
+       for _, fn := range files {
+               f := ret.Files[fn]
                attr := fuse.Stat_t{}
                attr.Size = f.Size
                if f.IsDirectory {

Normally, the issue was visible at least about once every 20-30 runs of ls, depending on Readdir duration and load etc, but now it's run about 500 times without the issue showing... Also, the weird unexpected Stats of "Tsttst/\x98\x03" and similar seem to have stopped with this patch.

Perhaps there's an incorrect assumption in cgofuse or macfuse that the readdir results are sorted, or at least in a consistent order? That didn't hold in my case, since a go Map sorting isn't consistent...

@sgielen sgielen changed the title On macOS, under load, files appearing twice or disappearing when running ls On macOS, under load, files appearing twice or disappearing when running ls, unless readdir is sorted May 24, 2021
@billziss-gh
Copy link
Collaborator

How are you using the ofst argument when calling fill? Do you pass 0 or alternatively do you maintain stable offsets to your directory entries?

@sgielen
Copy link
Author

sgielen commented May 24, 2021

How are you using the ofst argument when calling fill? Do you pass 0 or alternatively do you maintain stable offsets to your directory entries?

Hi @billziss-gh, thanks for getting back to me!

I always pass 0 for ofst when calling fill. You can find the code here: https://gist.github.com/sgielen/b229ded71f5946fc242bfb6e75e08da6#file-client-go-L211 (this is without the above patch applied, so still the ‘broken’ version.)

@billziss-gh
Copy link
Collaborator

Do you ever get a non-0 offset in Readdir? (You should not if you always pass 0 to fill.)

@billziss-gh
Copy link
Collaborator

BTW, one other thing I noticed is that you do not implement Opendir. I am wondering if this somehow causes the problem.

@sgielen
Copy link
Author

sgielen commented May 25, 2021

Do you ever get a non-0 offset in Readdir? (You should not if you always pass 0 to fill.)

I logged this for a while, and saw no non-0 offsets given to Readdir, even as the issue appeared. I will try again, while also adding an implementation of Opendir and see if that makes a difference.

@billziss-gh
Copy link
Collaborator

The non-0 offsets to Readdir indicate that something is wrong. If you only pass 0 to fill You should not see non-0 offsets in Readdir. (This is true under libfuse (including the version in OSXFUSE) and WinFsp. I note that it may not be true in OpenBSD. See #49.)

@sgielen
Copy link
Author

sgielen commented May 25, 2021

I can confirm that even with Opendir implemented, the issue still occurs. Readdir always has ofst=0. I tested with the following patch applied on top of my Gist:

diff --git a/client.go b/client.go
index b7cab35..eaab5d0 100644
--- a/client.go
+++ b/client.go
@@ -184,12 +184,21 @@ func (f *Mount) Getattr(path string, attr *fuse.Stat_t, fh uint64) int {
 	return 0
 }
 
+func (f *Mount) Opendir(path string) (int, uint64) {
+	log.Printf("Opendir %q", path)
+	return 0, atomic.AddUint64(&f.lastHandle, 1)
+}
+
 func (f *Mount) Readdir(path string, fill func(name string, stat *fuse.Stat_t, ofst int64) bool, ofst int64, fh uint64) int {
 	/*start := time.Now()
 	defer func() {
 		log.Printf("Readdir %q took %d ms", path, time.Since(start).Milliseconds())
 	}()*/
 
+	if ofst != 0 {
+		log.Fatalf("ofst in Readdir was not 0, but %d\n", ofst)
+	}
+
 	path = stripPath(path)
 	ret := vfsReaddir(path)
 	if ret == nil {

The log.Fatalf was not triggered. I did see output such as:

21:45:04.596330 client.go:188: Opendir "/Tsttst/EEEEE EEEE/aaaaaa aaaaaa (0000)"
21:45:04.653208 client.go:188: Opendir "/Tsttst/KKKKKK KKKKKKKK.KKKKKKKKK"
21:45:04.655606 client.go:188: Opendir "/Tsttst/GGGGGG"
21:45:04.657734 client.go:172: Invalid stat "Tsttst/\x16" (2), returning ENOENT
21:45:04.687663 client.go:188: Opendir "/Tsttst/KKKKKK KKKKKKKK.KKKKKKKKK/ccccccccccc.ccccccccc"
21:45:04.687912 client.go:188: Opendir "/Tsttst/KKKKKK KKKKKKKK.KKKKKKKKK"
21:45:04.726393 client.go:188: Opendir "/Tsttst/UUU UUUUU UUUUUUUU"

confirming that the changes were applied. The issue is visible in the output of another terminal running while :; do ls test/Tsttst | wc -l; sleep 1; done:

      25
      25
      16    # <--
      25
      25
      25
[...]
      25
      25
      25
      25
      24   # <--
      20   # <--
      25
      25

@billziss-gh
Copy link
Collaborator

You said:

Perhaps there's an incorrect assumption in cgofuse or macfuse that the readdir results are sorted, or at least in a consistent order? That didn't hold in my case, since a go Map sorting isn't consistent...

Cgofuse is a thin layer over the local platform's FUSE library and therefore does not (should not) make assumptions about directory entry order. There may be a problem in macfuse although I have never observed it (but in general my file systems return directory listings in consistent order).

I note that in your file system every time a process issues opendir and readdir* it will get different results (the notation readdir* is taken to mean issue readdir 0 or more times in a loop). Likewise if it issues opendir, readdir*, rewinddir, readdir*. Perhaps this confuses a particular component.

21:45:04.657734 client.go:172: Invalid stat "Tsttst/\x16" (2), returning ENOENT

Did you really access a file name with an \x16 character in it?

@sgielen
Copy link
Author

sgielen commented May 26, 2021

You said:

Perhaps there's an incorrect assumption in cgofuse or macfuse that the readdir results are sorted, or at least in a consistent order? That didn't hold in my case, since a go Map sorting isn't consistent...

Cgofuse is a thin layer over the local platform's FUSE library and therefore does not (should not) make assumptions about directory entry order. There may be a problem in macfuse although I have never observed it (but in general my file systems return directory listings in consistent order).

Perhaps then a good next step would be to try to convert this testcase to C to talk directly against the macfuse implementation and see if I can reproduce the same issue there. What do you think?

Did you really access a file name with an \x16 character in it?

Definitely not, my testing code just runs ls and du over the filesystem, and if I sort the files in my Readdir impl this doesn't occur, so it's really caused by this issue...

I note that in your file system every time a process issues opendir and readdir* it will get different results (the notation readdir* is taken to mean issue readdir 0 or more times in a loop). Likewise if it issues opendir, readdir*, rewinddir, readdir*. Perhaps this confuses a particular component.

How do you mean this? For the same directory, this testcase always returns the same files (except indeed in a different order). But, different files are visible in ls sometimes, hence the bug.

@billziss-gh
Copy link
Collaborator

Perhaps then a good next step would be to try to convert this testcase to C to talk directly against the macfuse implementation and see if I can reproduce the same issue there. What do you think?

Yes, this would be ideal. Although you would have to make sure to return the file name list in an inconsistent order.

Definitely not, my testing code just runs ls and du over the filesystem, and if I sort the files in my Readdir impl this doesn't occur, so it's really caused by this issue...

This suggests a real problem. Is this reproducible?

How do you mean this? For the same directory, this testcase always returns the same files (except indeed in a different order).

Yes, I meant that the directory listing will be in different order.

@sgielen
Copy link
Author

sgielen commented Jun 3, 2021

Hi @billziss-gh! I've written the testcase against macfuse in C++ and could reproduce the same issue. That's probably good news for you! ;-)

I've reported the issue further upstream at osxfuse/osxfuse#819, along with the test case. Unfortunately, since macfuse is closed-source nowadays that means my debugging stops here. Hopefully the macfuse maintainer can reproduce and can find a fix :-)

Feel free to close this issue, unless you'd like to keep tracking it.

@sgielen
Copy link
Author

sgielen commented Jun 4, 2021

To answer your other question:

This suggests a real problem. Is this reproducible?

Yes, for me the code in the Gist linked above always reproduces the non-printable characters issue. It sometimes takes a couple minutes for the messages to appear.

@billziss-gh
Copy link
Collaborator

@sgielen thanks for the update and for your investigation on this. Let's keep this issue open for now so we can track the progress on the upstream issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants