Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INVAL/OBJECT_NAME_INVALID in fs APIs incorrectly marked as unreachable #15607

Open
squeek502 opened this issue May 7, 2023 · 1 comment
Open
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@squeek502
Copy link
Collaborator

squeek502 commented May 7, 2023

Currently, Zig treats passing invalid paths to fs-related APIs as programmer error, meaning that the APIs treat returns like INVAL or OBJECT_NAME_INVALID as unreachable. I've (tentatively) come to the conclusion that this strategy is untenable, and that these errors should be treated as reachable.

From #15382 (comment):

To protect against hitting unreachable on all platforms, we'll likely either need:

  • A robust path validation function that works properly for each platform. This seems possible (?) but it's likely we'll have to end up chasing corner cases for a while (and I'm unsure how different filesystems affect this, are illegal pathnames tied to the OS or to the particular filesystem that may be independent of the OS?).
  • To abandon the 'invalid path => unreachable' strategy throughout fs-related std lib and instead treat invalid paths as a possible error everywhere. This seems more sensible to me personally, but I'm unsure if there are any benefits to the => unreachable strategy.

Here's my attempt at a summary of whether or not such a path validation function is possible per-platform:

  • On Windows, the APIs themselves are the limiting factor in what is and isn't allowed. For example, if the underlying filesystem is NTFS, it can technically support any filename, but the Windows APIs disallow the characters " * / : < > ? \ | among other things. That is, these disallowed filenames can be created on NTFS filesystems if the file is created from Linux or if FILE_FLAG_POSIX_SEMANTICS is used when calling the Windows APIs (see Naming Files, Paths, and Namespaces and CreateFileW).
    • As far as I can tell, this means that it may be possible to ahead-of-time validate a path if we know that the path is going to be used in a Windows API call in the Win32 namespace. There may be counterexamples that make this impossible even for the Windows APIs and the Win32 namespace, though.
    • See Windows: Support UNC, rooted, drive relative, and namespaced/device paths #15768 (especially the note towards the end)
  • On POSIX systems, only / and NUL are disallowed on the common filesystems, but it's the underlying filesystem limitations that ultimately matter. The same call to something like openat may or may not hit INVAL depending on the underlying filesystem (from man openat: EINVAL O_CREAT was specified in flags and the final component ("basename") of the new file's pathname is invalid (e.g., it contains characters not permitted by the underlying filesystem)).
    • This effectively makes an ahead-of-time path validation function impossible on POSIX systems.

Here's an example that demonstrates the problem on Linux:

// createfile.zig
const std = @import("std");

pub fn main() !void {
    // | is an allowed character on ext4 but not on FAT filesystems
    const file = try std.fs.cwd().createFile("hello|world", .{ .truncate = false });
    defer file.close();
}

Works fine when run on an ext4 fs:

~/Programming/zig/tmp$ ./createfile && ls -l hello\|world
-rw-rw-r-- 1 ryan ryan 0 May  6 23:28 'hello|world'

But when run on a vfat fs (that disallows | characters at the filesystem level):

/media/ryan/TestFAT$ ~/Programming/zig/tmp/createfile
thread 10084 panic: reached unreachable code
/home/ryan/Programming/zig/zig/lib/std/os.zig:1729:23: 0x22cec7 in openatZ (createfile)
            .INVAL => unreachable,
                      ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:1336:64: 0x20e2e2 in createFileZ (createfile)
            try os.openatZ(self.fd, sub_path_c, os_flags, flags.mode);
                                                               ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:1275:32: 0x20b52d in createFile (createfile)
        return self.createFileZ(&path_c, flags);
                               ^
/home/ryan/Programming/zig/tmp/createfile.zig:4:45: 0x20b3c5 in main (createfile)
    const file = try std.fs.cwd().createFile("hello|world", .{ .truncate = false });
                                            ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:609:37: 0x20ae0e in posixCallMainAndExit (createfile)
            const result = root.main() catch |err| {
                                    ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:368:5: 0x20a871 in _start (createfile)
    @call(.never_inline, posixCallMainAndExit, .{});
    ^
Aborted (core dumped)

I'm unsure if there's a way to know in advance what the underlying filesystem is, but I'm assuming there isn't, and that this means that this unreachable is in fact inherently reachable.

Related to:

Further reading:

squeek502 added a commit to squeek502/zig that referenced this issue May 19, 2023
…aths

There are many different types of Windows paths, and there are a few different possible namespaces on top of that. Before this commit, NT namespaced paths were somewhat supported, and for Win32 paths (those without a namespace prefix), only relative and drive absolute paths were supported. After this commit, all of the following are supported:

- Device namespaced paths (`\\.\`)
- Verbatim paths (`\\?\`)
- NT-namespaced paths (`\??\`)
- Relative paths (`foo`)
- Drive-absolute paths (`C:\foo`)
- Rooted paths (`\foo`)
- UNC absolute paths (`\\server\share\foo`)
- Root local device paths (`\\.` or `\\?` exactly)

Plus:

- Any of the path types and namespace types can be mixed and matched together as appropriate.
- All of the `std.os.windows.*ToPrefixedFileW` functions will accept any path type, prefixed or not, and do the appropriate thing to convert them to an NT-prefixed path if necessary.

This is achieved by making the `std.os.windows.*ToPrefixedFileW` functions behave like `ntdll.RtlDosPathNameToNtPathName_U`, but with a few differences:

- Does not allocate on the heap (this is why we can't use `ntdll.RtlDosPathNameToNtPathName_U` directly, it does internal heap allocation).
- Relative paths are kept as relative unless they contain too many .. components, in which case they are treated as rooted and resolved against the current drive (this is how it behaved before this commit as well).
- Special case device names like COM1, NUL, etc are not handled specially (TODO)
- `.` and space are not stripped from the end of relative paths (potential TODO)

Most of the non-trivial conversion of non-relative paths is done via `ntdll.RtlGetFullPathName_U`, which AFAIK is used internally by `ntdll.RtlDosPathNameToNtPathName_U`.

Some relevant reading on Windows paths:

- https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html
- https://chrisdenton.github.io/omnipath/Overview.html

Closes ziglang#8205
Might close (untested) ziglang#12729

Note:
- This removes checking for illegal characters in `std.os.windows.sliceToPrefixedFileW`, since the previous solution (iterate the whole string and error if any illegal characters were found) was naive and won't work for all path types. This is further complicated by things like file streams (where `:` is used as a delimiter, e.g. `file.ext:stream_name:$DATA`) and things in the device namespace (where a path like `\\.\GLOBALROOT\??\UNC\localhost\C$\foo` is valid despite the `?`s in the path and is effectively equivalent to `C:\foo`). Truly validating paths is complicated and would need to be tailored to each path type. The illegal character checking being removed may open up users to more instances of hitting `OBJECT_NAME_INVALID => unreachable` when using `fs` APIs.
  + This is related to ziglang#15607
squeek502 added a commit to squeek502/zig that referenced this issue May 19, 2023
…aths

There are many different types of Windows paths, and there are a few different possible namespaces on top of that. Before this commit, NT namespaced paths were somewhat supported, and for Win32 paths (those without a namespace prefix), only relative and drive absolute paths were supported. After this commit, all of the following are supported:

- Device namespaced paths (`\\.\`)
- Verbatim paths (`\\?\`)
- NT-namespaced paths (`\??\`)
- Relative paths (`foo`)
- Drive-absolute paths (`C:\foo`)
- Drive-relative paths (`C:foo`)
- Rooted paths (`\foo`)
- UNC absolute paths (`\\server\share\foo`)
- Root local device paths (`\\.` or `\\?` exactly)

Plus:

- Any of the path types and namespace types can be mixed and matched together as appropriate.
- All of the `std.os.windows.*ToPrefixedFileW` functions will accept any path type, prefixed or not, and do the appropriate thing to convert them to an NT-prefixed path if necessary.

This is achieved by making the `std.os.windows.*ToPrefixedFileW` functions behave like `ntdll.RtlDosPathNameToNtPathName_U`, but with a few differences:

- Does not allocate on the heap (this is why we can't use `ntdll.RtlDosPathNameToNtPathName_U` directly, it does internal heap allocation).
- Relative paths are kept as relative unless they contain too many .. components, in which case they are treated as rooted and resolved against the current drive (this is how it behaved before this commit as well).
- Special case device names like COM1, NUL, etc are not handled specially (TODO)
- `.` and space are not stripped from the end of relative paths (potential TODO)

Most of the non-trivial conversion of non-relative paths is done via `ntdll.RtlGetFullPathName_U`, which AFAIK is used internally by `ntdll.RtlDosPathNameToNtPathName_U`.

Some relevant reading on Windows paths:

- https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html
- https://chrisdenton.github.io/omnipath/Overview.html

Closes ziglang#8205
Might close (untested) ziglang#12729

Note:
- This removes checking for illegal characters in `std.os.windows.sliceToPrefixedFileW`, since the previous solution (iterate the whole string and error if any illegal characters were found) was naive and won't work for all path types. This is further complicated by things like file streams (where `:` is used as a delimiter, e.g. `file.ext:stream_name:$DATA`) and things in the device namespace (where a path like `\\.\GLOBALROOT\??\UNC\localhost\C$\foo` is valid despite the `?`s in the path and is effectively equivalent to `C:\foo`). Truly validating paths is complicated and would need to be tailored to each path type. The illegal character checking being removed may open up users to more instances of hitting `OBJECT_NAME_INVALID => unreachable` when using `fs` APIs.
  + This is related to ziglang#15607
squeek502 added a commit to squeek502/zig that referenced this issue May 19, 2023
…aths

There are many different types of Windows paths, and there are a few different possible namespaces on top of that. Before this commit, NT namespaced paths were somewhat supported, and for Win32 paths (those without a namespace prefix), only relative and drive absolute paths were supported. After this commit, all of the following are supported:

- Device namespaced paths (`\\.\`)
- Verbatim paths (`\\?\`)
- NT-namespaced paths (`\??\`)
- Relative paths (`foo`)
- Drive-absolute paths (`C:\foo`)
- Drive-relative paths (`C:foo`)
- Rooted paths (`\foo`)
- UNC absolute paths (`\\server\share\foo`)
- Root local device paths (`\\.` or `\\?` exactly)

Plus:

- Any of the path types and namespace types can be mixed and matched together as appropriate.
- All of the `std.os.windows.*ToPrefixedFileW` functions will accept any path type, prefixed or not, and do the appropriate thing to convert them to an NT-prefixed path if necessary.

This is achieved by making the `std.os.windows.*ToPrefixedFileW` functions behave like `ntdll.RtlDosPathNameToNtPathName_U`, but with a few differences:

- Does not allocate on the heap (this is why we can't use `ntdll.RtlDosPathNameToNtPathName_U` directly, it does internal heap allocation).
- Relative paths are kept as relative unless they contain too many .. components, in which case they are treated as 'drive relative' and resolved against the CWD (this is how it behaved before this commit as well).
- Special case device names like COM1, NUL, etc are not handled specially (TODO)
- `.` and space are not stripped from the end of relative paths (potential TODO)

Most of the non-trivial conversion of non-relative paths is done via `ntdll.RtlGetFullPathName_U`, which AFAIK is used internally by `ntdll.RtlDosPathNameToNtPathName_U`.

Some relevant reading on Windows paths:

- https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html
- https://chrisdenton.github.io/omnipath/Overview.html

Closes ziglang#8205
Might close (untested) ziglang#12729

Note:
- This removes checking for illegal characters in `std.os.windows.sliceToPrefixedFileW`, since the previous solution (iterate the whole string and error if any illegal characters were found) was naive and won't work for all path types. This is further complicated by things like file streams (where `:` is used as a delimiter, e.g. `file.ext:stream_name:$DATA`) and things in the device namespace (where a path like `\\.\GLOBALROOT\??\UNC\localhost\C$\foo` is valid despite the `?`s in the path and is effectively equivalent to `C:\foo`). Truly validating paths is complicated and would need to be tailored to each path type. The illegal character checking being removed may open up users to more instances of hitting `OBJECT_NAME_INVALID => unreachable` when using `fs` APIs.
  + This is related to ziglang#15607
Vexu pushed a commit that referenced this issue May 29, 2023
…aths

There are many different types of Windows paths, and there are a few different possible namespaces on top of that. Before this commit, NT namespaced paths were somewhat supported, and for Win32 paths (those without a namespace prefix), only relative and drive absolute paths were supported. After this commit, all of the following are supported:

- Device namespaced paths (`\\.\`)
- Verbatim paths (`\\?\`)
- NT-namespaced paths (`\??\`)
- Relative paths (`foo`)
- Drive-absolute paths (`C:\foo`)
- Drive-relative paths (`C:foo`)
- Rooted paths (`\foo`)
- UNC absolute paths (`\\server\share\foo`)
- Root local device paths (`\\.` or `\\?` exactly)

Plus:

- Any of the path types and namespace types can be mixed and matched together as appropriate.
- All of the `std.os.windows.*ToPrefixedFileW` functions will accept any path type, prefixed or not, and do the appropriate thing to convert them to an NT-prefixed path if necessary.

This is achieved by making the `std.os.windows.*ToPrefixedFileW` functions behave like `ntdll.RtlDosPathNameToNtPathName_U`, but with a few differences:

- Does not allocate on the heap (this is why we can't use `ntdll.RtlDosPathNameToNtPathName_U` directly, it does internal heap allocation).
- Relative paths are kept as relative unless they contain too many .. components, in which case they are treated as 'drive relative' and resolved against the CWD (this is how it behaved before this commit as well).
- Special case device names like COM1, NUL, etc are not handled specially (TODO)
- `.` and space are not stripped from the end of relative paths (potential TODO)

Most of the non-trivial conversion of non-relative paths is done via `ntdll.RtlGetFullPathName_U`, which AFAIK is used internally by `ntdll.RtlDosPathNameToNtPathName_U`.

Some relevant reading on Windows paths:

- https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html
- https://chrisdenton.github.io/omnipath/Overview.html

Closes #8205
Might close (untested) #12729

Note:
- This removes checking for illegal characters in `std.os.windows.sliceToPrefixedFileW`, since the previous solution (iterate the whole string and error if any illegal characters were found) was naive and won't work for all path types. This is further complicated by things like file streams (where `:` is used as a delimiter, e.g. `file.ext:stream_name:$DATA`) and things in the device namespace (where a path like `\\.\GLOBALROOT\??\UNC\localhost\C$\foo` is valid despite the `?`s in the path and is effectively equivalent to `C:\foo`). Truly validating paths is complicated and would need to be tailored to each path type. The illegal character checking being removed may open up users to more instances of hitting `OBJECT_NAME_INVALID => unreachable` when using `fs` APIs.
  + This is related to #15607
@andrewrk andrewrk added the standard library This issue involves writing Zig code for the standard library. label Jul 23, 2023
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Jul 23, 2023
@yuyoyuppe
Copy link

yuyoyuppe commented Aug 1, 2023

Coming back to this from #14533, I don't see the benefits the => unreachable strategy might have either.

Given that OS APIs must internally perform validation, and they have no issues recovering from failing to do so, why should Zig force its users to introduce redundant validation logic in the form of a std.fs.isValidPath function? Why not bubble the error up and allow users to choose how to handle it?

The broader design question in this context is: how should we handle a generic os.call(input: garbage_os_resource) -> os.error situation? I'd expect that a standard library would map the os.error to a platform-independent and more meaningful representation, as std.lib.call might have a higher-level context. In our case, merely mapping it to error.BadPathName is perfectly acceptable.

As @squeek502 pointed out, it's impossible to perform path validation in advance on POSIX systems, and it's easy to find many similar instances of garbage_os_resource of different types, such as various HANDLEs, that are also impossible to validate without the OS's help. Why should we try to duplicate the OS's job?

@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Oct 17, 2023
@andrewrk andrewrk changed the title Questioning the unreachability of INVAL/OBJECT_NAME_INVALID in fs APIs INVAL/OBJECT_NAME_INVALID in fs APIs incorrectly marked as unreachable Oct 17, 2023
tealsnow added a commit to tealsnow/zig that referenced this issue May 1, 2024
Contributes to ziglang#15607

Although the case is not handled in `openatWasi` (as I could not get a
working wasi environment to test the change) I have added a FIXME
addressing it and linking to the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

3 participants