Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.fs.IterableDir.Walker doesn't ignore macOS "firmlinks" as it does symlinks, resulting in iterating some dirs twice and sometimes infinite loops #13355

Open
hippietrail opened this issue Oct 30, 2022 · 2 comments · May be fixed by #20344
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. os-macos standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@hippietrail
Copy link

hippietrail commented Oct 30, 2022

Zig Version

0.10.0-dev.4324+c23b3e6fd

Steps to Reproduce and Observed Behavior

This is probably not a fully minimal example but I'm not fluent in Zig yet:

const std = @import("std");
const al = std.heap.page_allocator;

pub fn main() !void {
    if (std.os.argv.len == 3) {
        try search(
            std.os.argv[1][0..std.mem.len(std.os.argv[1])],
            std.os.argv[2][0..std.mem.len(std.os.argv[2])]);
        std.debug.print("done\n", .{});
    } else {
        std.debug.print("** first arg is path to walk, second is substring to look for in directory names", .{});
    }
}

// ?!T is easier to work with than !?T for some kinds of iterators
fn flipiter(comptime T: type, inp: anyerror!?T) ?anyerror!T {
    return (try inp) orelse null;
}

fn search(path:[]const u8, string:[]const u8) !void {
    const start_path = if (path[path.len - 1] == '/') path[0..path.len-1] else path[0..];

    var idir = std.fs.cwd().openIterableDir(path, .{}) catch |err| {
        std.debug.print("** err: {s} {any}\n", .{path, err});
    };
    defer idir.close();

    var walker = try idir.walk(al);
    defer walker.deinit();

    while (flipiter(std.fs.IterableDir.Walker.WalkerEntry, walker.next())) |err_or_entry| {
        if (err_or_entry) |entry| {
            if (entry.kind == .Directory) {
                if (std.mem.indexOf(u8, entry.basename, string)) |x| {
                    _ = x;
                    if (entry.dir.realpathAlloc(al, entry.basename)) |realpath| {
                        defer al.free(realpath);

                        std.debug.print("> {s}/{s}\n» {s}\n", .{start_path, entry.path, realpath});
                    } else |err| {
                        std.debug.print("** realpath err {?}\n", .{err});
                    }
                }
            }
        } else |err| {
            if (err != error.AccessDenied) std.debug.print("{!}\t{s}/{s}\n", .{err, start_path, walker.name_buffer.items});
        }
    }
}

Run it with a starting path and a substring to search for. It will walk the directory and print out all paths that contain the substring after a > symbol and then its resolved canonical path after a » symbol on the next line.

I can only see the wrong behaviour when running in the root directory / as macOS has several directories from /System/Volumes/Data/x linked to /x using "firmlinks", which is a feature of the APFS filesystem going back to Sierra. Example firmlinked directories.

  • /Library -> /System/Volumes/Data/Library
  • /Users -> /System/Volumes/Data/Users
  • /Applications -> /System/Volumes/Data/Applications

I can't find Apple official documentation on firmlinks. Apparently they're supposed to be transparent and not affect most code. I also don't know how to get them to show up using ls. The ls commandline I know that will show the most extra info macOS-specific info is ls -alO@ / but that doesn't seem to reveal them unless sunlnk means this and only this:

drwxr-xr-x  20 root  wheel  sunlnk             640 24 Aug 16:59 .
drwxr-xr-x  20 root  wheel  sunlnk             640 24 Aug 16:59 ..
lrwxr-xr-x   1 root  admin  -                   36 24 Aug 16:59 .VolumeIcon.icns -> System/Volumes/Data/.VolumeIcon.icns
----------   1 root  admin  -                    0 24 Aug 16:59 .file
drwxr-xr-x   2 root  wheel  hidden              64 24 Aug 16:59 .vol
drwxrwxr-x  44 root  admin  sunlnk            1408 28 Oct 09:31 Applications
drwxr-xr-x  67 root  wheel  sunlnk            2144 14 Sep 10:42 Library
drwxr-xr-x@  9 root  wheel  restricted         288 24 Aug 16:59 System
	com.apple.rootless	   0 
drwxr-xr-x   5 root  admin  sunlnk             160 24 Aug 16:59 Users
drwxr-xr-x   3 root  wheel  hidden              96 25 Oct 15:39 Volumes
drwxr-xr-x@ 38 root  wheel  restricted,hidden 1216 24 Aug 16:59 bin
	com.apple.rootless	   0 
drwxrwxr-t   2 root  admin  hidden              64  8 Dec  2020 cores
dr-xr-xr-x   4 root  wheel  hidden            4888 25 Oct 15:39 dev
lrwxr-xr-x@  1 root  wheel  restricted,hidden   11 24 Aug 16:59 etc -> private/etc
	com.apple.rootless	   0 
lrwxr-xr-x   1 root  wheel  hidden              25 25 Oct 15:39 home -> /System/Volumes/Data/home
drwxr-xr-x   2 root  wheel  hidden              64  8 Dec  2020 opt
drwxr-xr-x   6 root  wheel  sunlnk,hidden      192 25 Oct 15:39 private
drwxr-xr-x@ 65 root  wheel  restricted,hidden 2080 24 Aug 16:59 sbin
	com.apple.rootless	   0 
lrwxr-xr-x@  1 root  wheel  restricted,hidden   11 24 Aug 16:59 tmp -> private/tmp
	com.apple.rootless	   0 
drwxr-xr-x@ 11 root  wheel  restricted,hidden  352 24 Aug 16:59 usr
	com.apple.rootless	   0 
lrwxr-xr-x@  1 root  wheel  restricted,hidden   11 24 Aug 16:59 var -> private/var
	com.apple.rootless	   0 

Sample commandline:

zig run src/walk.zig 2>&1 -- / LLVM |less -S

Actual output:

> /Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
» /Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
> /System/Volumes/Data/Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
» /Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
> /System/Volumes/Data/Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
» /Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
> /System/Volumes/Data/Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
» /Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
> /System/Volumes/Data/Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
» /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
> /Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
» /Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
> /Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
» /Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
> /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
» /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
done

Expected Behavior

Directories should only be listed at their true locations as is the case with symlinks, not also at the locations they're firmlinked to. I believe that would look like this:

> /Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
» /Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
> /Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
» /Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
> /Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
» /Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
> /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
» /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
done
@hippietrail hippietrail added the bug Observed behavior contradicts documented or intended behavior label Oct 30, 2022
@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. os-macos contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Oct 31, 2022
@andrewrk andrewrk added this to the 0.11.0 milestone Oct 31, 2022
@hippietrail
Copy link
Author

hippietrail commented Oct 31, 2022

A more-or-less equivalent Rust program shares the behaviour of Zig, but Swift does not. This makes me think the dir walker code in each language's libraries does in fact need to handle firmlinks specifically and that Swift must do do. This despite claims they should be transparent to existing code.

Here's my Swift code's output:

dirwalker / LLVM
******** / ********
Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
---- all done ----

And the Swift code itself. You can see there's no special code for firmlinks:

import Foundation
import AppKit

let fileManager = FileManager.default

let resKeys : [URLResourceKey] = [.isDirectoryKey, .fileSizeKey, .isSymbolicLinkKey]

let startURL: URL = URL(string: fileManager.currentDirectoryPath)!

guard CommandLine.arguments.count == 3 else {
    print("** usage: dirwalker path string")
    exit(1);
}

let pathArg = CommandLine.arguments[1]
let matchArg = CommandLine.arguments[2]

if let path = pathArg.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) {
    if let url = URL(string: path) {
        let en = fileManager.enumerator(at: url,
                                        includingPropertiesForKeys: resKeys,
                                        options: [.producesRelativePathURLs],
                                        errorHandler: { (url, error) -> Bool in
            return true }
        )!

        mainloop: for case let fileURL as URL in en {
            do {
                let rv = try fileURL.resourceValues(forKeys: Set(resKeys))
                if let d = rv.isDirectory, d {
                    
                    let filename: String = fileURL.lastPathComponent;

                    if filename.contains(matchArg) {
                        print(fileURL.relativePath)
                    }
                }
            } catch {
                print("** error 2:", error)
            }
        }
    }
}

print("done")

@xEgoist
Copy link
Contributor

xEgoist commented Apr 20, 2023

I did research this issue a while back, but here are my findings/notes in case it might help someone who wants to implement it or comment their opinion on it:

  • firmlinks can only apply on directories.

  • While there is an internal flag for identifying firmlinks, it is not set or used on APFS (see here)

  • What some projects implements like rofl0r/ncdu (with--exclude-firmlinks) is checking the reference attribute to see if it points to the same location (see here).

    • Another way of identifying firmlinks that I found is by using GETPATH_NOFIRMLINK (102) when using fcntl call to show the firmlink reference and compare it.
      This is only documented in XCode man pages.
      This would look similar fs.realpath or os.realpath darwin functions (getFdPath).

      zig/lib/std/os.zig

      Lines 5218 to 5229 in a1aa55e

      .macos, .ios, .watchos, .tvos => {
      // On macOS, we can use F.GETPATH fcntl command to query the OS for
      // the path to the file descriptor.
      @memset(out_buffer, 0, MAX_PATH_BYTES);
      switch (errno(system.fcntl(fd, F.GETPATH, out_buffer))) {
      .SUCCESS => {},
      .BADF => return error.FileNotFound,
      .NOSPC => return error.NameTooLong,
      // TODO man pages for fcntl on macOS don't really tell you what
      // errno values to expect when command is F.GETPATH...
      else => |err| return unexpectedErrno(err),
      }
  • Synthetic firmlinks can be created since MacOS Catalina via synthetic.conf
    (Although i am not sure whether it actually creates a firmlink or an identifiable symlink with flag, needs further checking).

  • For system firmlinks, /usr/share/firmlinks contains the list of firmlinks.

What's interesting is that MacOS's /bin/realpath does not resolve firmlinks.


Here are way that I think can be used to solve this is:

  • when calling nextDarwin and finding a Directory,
    double check the reference/realpath and set entry kind depending on the result.

    os.DT.DIR => Entry.Kind.Directory,

  • Documenting that Walker does not recognize/classify firmlinks and let the user do this check manually and/or maybe add a Firmlink check function?

  • Check with /usr/share/firmlinks and synthetic.conf (this can apply for any of the above).

I am not sure what's the best approach here as there aren't many system firmlinks and it's uncommon to synthesize (if that's even possible).


Note: I could be missing some other info, but this is what I have found.

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Jul 20, 2023
@hippietrail hippietrail changed the title std.fs.IterableDir.Walker doesn't ignore macOS "firmlinks" as it does symlinks, resulting in iterating over some dirs twice std.fs.IterableDir.Walker doesn't ignore macOS "firmlinks" as it does symlinks, resulting in iterating some dirs twice and sometimes infinite loops Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. os-macos standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants