Skip to content

Conversation

@squeek502
Copy link
Member

Previously, fs.path handled a few of the Windows path types, but not all of them, and only a few of them correctly/consistently. This PR aims to make std.fs.path correct and consistent in handling all possible Win32 path types.

This PR also slightly nudges the codebase towards a separation of Win32 paths and NT paths, as NT paths are not actually distinguishable from Win32 paths from looking at their contents alone (i.e. \Device\Foo could be an NT path or a Win32 rooted path, no way to tell without external context). This commit formalizes std.fs.path being fully concerned with Win32 paths, and having no special detection/handling of NT paths.

Resources on Windows path types, and Win32 vs NT paths:

API additions/changes/deprecations

  • std.os.windows.getWin32PathType was added (it is analogous to RtlDetermineDosPathNameType_U), while std.os.windows.getNamespacePrefix and std.os.windows.getUnprefixedPathType were deleted. getWin32PathType forms the basis on which the updated std.fs.path functions operate.
  • std.fs.path.parsePath, std.fs.path.parsePathPosix, and std.fs.path.parsePathWindows were added, while std.fs.path.windowsParsePath was deprecated. The new parsePath functions provide the "root" and the "kind" of a path, which is platform-specific. The now-deprecated windowsParsePath did not handle all possible path types, while the new parsePathWindows does.
  • std.fs.path.diskDesignator has been deprecated in favor of std.fs.path.parsePath, and same deal with diskDesignatorWindows -> parsePathWindows
  • relativeWindows is now a compile error when not targeting Windows, while relativePosix is now a compile error when targeting Windows. This is because those functions read/use the CWD path which will behave improperly when used from a system with different path semantics (e.g. calling relativePosix from a Windows system with a CWD like C:\foo\bar will give you a bogus result since that'd be treated as a single relative component when using POSIX semantics). This also allows relativeWindows to use Windows-specific APIs for getting the CWD and environment variables to cut down on allocations.
  • componentIterator/ComponentIterator.init have been made infallible. These functions used to be able to error on UNC paths with an empty server component, and on paths that were assumed to be NT paths, but now:
    • We follow the lead of RtlDetermineDosPathNameType_U/RtlGetFullPathName_U in how it treats a UNC path with an empty server name (e.g. \\\share) and allow it, even if it'll be invalid at the time of usage
    • Now that std.fs.path assumes paths are Win32 paths and not NT paths, we don't have to worry about NT paths

Behavior changes

  • std.fs.path generally: any combinations of mixed path separators for UNC paths are universally supported, e.g. \/server/share, /\server\share, /\server/\\//share are all seen as equivalent UNC paths
  • resolveWindows handles all path types more appropriately/consistently.
  • dirnameWindows now treats the drive-relative root as the dirname of a drive-relative path with a component, e.g. dirname("C:foo") is now C:, whereas before it would return null. dirnameWindows also handles local device paths appropriately now.
  • basenameWindows now handles all path types more appropriately. The most notable change here is //a being treated as a partial UNC path now and therefore basename will return "" for it, whereas before it would return "a"
  • relativeWindows will now do its best to resolve against the most appropriate CWD for each path, e.g. relative for D:foo will look at the CWD to check if the drive letter matches, and if not, look at the special environment variable =D: to get the shell-defined CWD for that drive, and if that doesn't exist, then it'll resolve against D:\.

Implementation details

  • resolveWindows previously looped through the paths twice to build up the relevant info before doing the actual resolution. Now, resolveWindows iterates backwards once and keeps track of which paths are actually relevant using a bit set, which also allows it to break from the loop when it's no longer possible for earlier paths to matter.
  • A standalone test was added to test parts of relativeWindows since the CWD resolution logic depends on CWD information from the PEB and environment variables

Edge cases worth noting

  • A strange piece of trivia that I found out while working on this is that it's technically possible to have a drive letter that it outside the intended A-Z range, or even outside the ASCII range entirely. Since we deal with both WTF-8 and WTF-16 paths, path[0]/path[1]/path[2] will not always refer to the same bits of information, so to get consistent behavior, some decision about how to deal with this edge case had to be made. I've made the choice to conform with how RtlDetermineDosPathNameType_U works, i.e. treat the first WTF-16 code unit as the drive letter. This means that when working with WTF-8, checking for drive-relative/drive-absolute paths is a bit more complicated. For more details, see the lengthy comment in std.os.windows.getWin32PathType
  • relativeWindows will now almost always be able to return either a fully-qualified absolute path or a relative path, but there's one scenario where it may return a rooted path: when the CWD gotten from the PEB is not a drive-absolute or UNC path (if that's actually feasible/possible?). An alternative approach to this scenario would be to resolve against the HOMEDRIVE env var if available, and/or default to C:\ as a last resort in order to guarantee the result of relative is never a rooted path.
  • Partial UNC paths (e.g. \\server instead of \\server\share) are a bit awkward to handle, generally. Not entirely sure how best to handle them, so there may need to be another pass in the future to iron out any issues that arise. As of now the behavior is:
    • For relative, any part of a UNC disk designator is treated as the "root" and therefore isn't applicable for relative paths, e.g. calling relative with \\server and \\server\share will result in \\server\share rather than just share and if relative is called with \\server\foo and \\server\bar the result will be \\server\bar rather than ..\bar
    • For resolve, any part of a UNC disk designator is also treated as the "root", but relative and rooted paths are still elligable for filling in missing portions of the disk designator, e.g. resolve with \\server and foo or \foo will result in \\server\foo

Fixes #25703
Closes #25702

Comment on lines +839 to +842
if (hasCommonNtPrefix(u16, target_path)) {
// Already an NT path, no need to do anything to it
break :target_path target_path;
} else {
Copy link
Member Author

@squeek502 squeek502 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlugg with regards to what we talked about, to remove the "wToPrefixedFileW checks for \??\ and treats it as an NT path" behavior in the future, all that would be necessary after this PR is to remove this if condition. This PR doesn't actually change the behavior, but it makes it much easier to pull the trigger on that in the future.

@daurnimator
Copy link
Contributor

Will/should this respect devices set up with subst?

@squeek502
Copy link
Member Author

squeek502 commented Nov 21, 2025

@daurnimator if I understand you correctly, that's not relevant for std.fs.path. No syscalls are made at all in std.fs.path for Windows; there's only some read-only PEB accesses (to get the CWD path and the drive CWD from environment variables).

In other words, std.fs.path will treat a path like Z:\foo exactly the same regardless of what it's mapped to.

See also #13613

@daurnimator
Copy link
Contributor

(to get the CWD path and the drive CWD from environment variables).

It's been a good 25 years or so, but I vaguely recall that the registry entries created by subst had to be consulted before the env vars?
I don't have any windows machines these days, but it might at least be worth adding a test for?

@squeek502
Copy link
Member Author

squeek502 commented Nov 21, 2025

Could you clarify what you think that test might look like?

probably irrelevant in-the-weeds details

As far as I can tell, subst Q: C:\foo creates a symlink at \DosDevices\Q: pointing to \??\C:\foo. This is roughly how all mounted drive letters work, i.e. C:\ is a symlink at \GLOBAL??\C: pointing to \Device\HarddiskVolume2 or whatever, a networked drive of mine is a symlink at \DosDevices\V: pointing to something like \Device\LanmanRedirector\;V:0000000000012345\server\share, etc, so I'm unsure why subst would need to be a special case.

(for extra context, \?? is a special virtual folder that combines \DosDevices and \GLOBAL??)

EDIT: I think you might be misunderstanding the API surface of std.fs.path. It only operates on path strings, with no relationship to the actual filesystem. That is, there's nothing like realpath in std.fs.path.

@daurnimator
Copy link
Contributor

Ah nevermind then. It just brought up a very old memory where subst might take priority over the env var of =G and hence if you were looking at the env var you would have to look at \DosDevices first? But that doesn't make sense. Might be some weird dos emulation mode, or a fake memory.

@squeek502
Copy link
Member Author

Drive-specific CWDs are purely a shell concept, =X: is essentially just a convention used by cmd.exe. There's an area of the PEB that seems relevant but is fully unused.

@squeek502 squeek502 force-pushed the windows-paths branch 2 times, most recently from 2b4af85 to b65f391 Compare November 21, 2025 07:40
Previously, fs.path handled a few of the Windows path types, but not all of them, and only a few of them correctly/consistently. This commit aims to make `std.fs.path` correct and consistent in handling all possible Win32 path types.

This commit also slightly nudges the codebase towards a separation of Win32 paths and NT paths, as NT paths are not actually distinguishable from Win32 paths from looking at their contents alone (i.e. `\Device\Foo` could be an NT path or a Win32 rooted path, no way to tell without external context). This commit formalizes `std.fs.path` being fully concerned with Win32 paths, and having no special detection/handling of NT paths.

Resources on Windows path types, and Win32 vs NT paths:

- https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html
- https://chrisdenton.github.io/omnipath/Overview.html
- https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

API additions/changes/deprecations

- `std.os.windows.getWin32PathType` was added (it is analogous to `RtlDetermineDosPathNameType_U`), while `std.os.windows.getNamespacePrefix` and `std.os.windows.getUnprefixedPathType` were deleted. `getWin32PathType` forms the basis on which the updated `std.fs.path` functions operate.
- `std.fs.path.parsePath`, `std.fs.path.parsePathPosix`, and `std.fs.path.parsePathWindows` were added, while `std.fs.path.windowsParsePath` was deprecated. The new `parsePath` functions provide the "root" and the "kind" of a path, which is platform-specific. The now-deprecated `windowsParsePath` did not handle all possible path types, while the new `parsePathWindows` does.
- `std.fs.path.diskDesignator` has been deprecated in favor of `std.fs.path.parsePath`, and same deal with `diskDesignatorWindows` -> `parsePathWindows`
- `relativeWindows` is now a compile error when *not* targeting Windows, while `relativePosix` is now a compile error when targeting Windows. This is because those functions read/use the CWD path which will behave improperly when used from a system with different path semantics (e.g. calling `relativePosix` from a Windows system with a CWD like `C:\foo\bar` will give you a bogus result since that'd be treated as a single relative component when using POSIX semantics). This also allows `relativeWindows` to use Windows-specific APIs for getting the CWD and environment variables to cut down on allocations.
- `componentIterator`/`ComponentIterator.init` have been made infallible. These functions used to be able to error on UNC paths with an empty server component, and on paths that were assumed to be NT paths, but now:
  + We follow the lead of `RtlDetermineDosPathNameType_U`/`RtlGetFullPathName_U` in how it treats a UNC path with an empty server name (e.g. `\\\share`) and allow it, even if it'll be invalid at the time of usage
  + Now that `std.fs.path` assumes paths are Win32 paths and not NT paths, we don't have to worry about NT paths

Behavior changes

- `std.fs.path` generally: any combinations of mixed path separators for UNC paths are universally supported, e.g. `\/server/share`, `/\server\share`, `/\server/\\//share` are all seen as equivalent UNC paths
- `resolveWindows` handles all path types more appropriately/consistently.
  + `//` and `//foo` used to be treated as a relative path, but are now seen as UNC paths
  + If a rooted/drive-relative path cannot be resolved against anything more definite, the result will remain a rooted/drive-relative path.
  + I've created [a script to generate the results of a huge number of permutations of different path types](https://gist.github.com/squeek502/9eba7f19cad0d0d970ccafbc30f463bf) (the result of running the script is also included for anyone that'd like to vet the behavior).
- `dirnameWindows` now treats the drive-relative root as the dirname of a drive-relative path with a component, e.g. `dirname("C:foo")` is now `C:`, whereas before it would return null. `dirnameWindows` also handles local device paths appropriately now.
- `basenameWindows` now handles all path types more appropriately. The most notable change here is `//a` being treated as a partial UNC path now and therefore `basename` will return `""` for it, whereas before it would return `"a"`
- `relativeWindows` will now do its best to resolve against the most appropriate CWD for each path, e.g. relative for `D:foo` will look at the CWD to check if the drive letter matches, and if not, look at the special environment variable `=D:` to get the shell-defined CWD for that drive, and if that doesn't exist, then it'll resolve against `D:\`.

Implementation details

- `resolveWindows` previously looped through the paths twice to build up the relevant info before doing the actual resolution. Now, `resolveWindows` iterates backwards once and keeps track of which paths are actually relevant using a bit set, which also allows it to break from the loop when it's no longer possible for earlier paths to matter.
- A standalone test was added to test parts of `relativeWindows` since the CWD resolution logic depends on CWD information from the PEB and environment variables

Edge cases worth noting

- A strange piece of trivia that I found out while working on this is that it's technically possible to have a drive letter that it outside the intended A-Z range, or even outside the ASCII range entirely. Since we deal with both WTF-8 and WTF-16 paths, `path[0]`/`path[1]`/`path[2]` will not always refer to the same bits of information, so to get consistent behavior, some decision about how to deal with this edge case had to be made. I've made the choice to conform with how `RtlDetermineDosPathNameType_U` works, i.e. treat the first WTF-16 code unit as the drive letter. This means that when working with WTF-8, checking for drive-relative/drive-absolute paths is a bit more complicated. For more details, see the lengthy comment in `std.os.windows.getWin32PathType`
- `relativeWindows` will now almost always be able to return either a fully-qualified absolute path or a relative path, but there's one scenario where it may return a rooted path: when the CWD gotten from the PEB is not a drive-absolute or UNC path (if that's actually feasible/possible?). An alternative approach to this scenario might be to resolve against the `HOMEDRIVE` env var if available, and/or default to `C:\` as a last resort in order to guarantee the result of `relative` is never a rooted path.
- Partial UNC paths (e.g. `\\server` instead of `\\server\share`) are a bit awkward to handle, generally. Not entirely sure how best to handle them, so there may need to be another pass in the future to iron out any issues that arise. As of now the behavior is:
  + For `relative`, any part of a UNC disk designator is treated as the "root" and therefore isn't applicable for relative paths, e.g. calling `relative` with `\\server` and `\\server\share` will result in `\\server\share` rather than just `share` and if `relative` is called with `\\server\foo` and `\\server\bar` the result will be `\\server\bar` rather than `..\bar`
  + For `resolve`, any part of a UNC disk designator is also treated as the "root", but relative and rooted paths are still elligable for filling in missing portions of the disk designator, e.g. `resolve` with `\\server` and `foo` or `\foo` will result in `\\server\foo`

Fixes ziglang#25703
Closes ziglang#25702
…WTF-16 is LE

This commit flips usage of PathType.isSep from requiring the caller to convert to native to assuming the input is LE encoded, which is a breaking change. This makes usage a bit nicer, though, and moves the endian conversion work from runtime to comptime.
@squeek502 squeek502 merged commit 53e615b into ziglang:master Nov 24, 2025
9 checks passed
@squeek502 squeek502 added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library. labels Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.

Projects

None yet

2 participants