Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve higher level file system functions to avoid error.NameTooLong when possible #4812

Open
daurnimator opened this issue Mar 25, 2020 · 4 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@daurnimator
Copy link
Contributor

MAX_PATH_BYTES is a fiction (see e.g. https://eklitzke.org/path-max-is-tricky). It would probably make sense to loop until we don't get ENAMETOOLONG?

_Originally posted by @daurnimator in #4811

@daurnimator can you open a separate issue to discuss MAX_PATH_BYTES?

Originally posted by @andrewrk in #4811 (comment)

@daurnimator daurnimator added the standard library This issue involves writing Zig code for the standard library. label Mar 25, 2020
@andrewrk andrewrk added this to the 0.7.0 milestone Mar 25, 2020
@andrewrk
Copy link
Member

andrewrk commented Mar 25, 2020

I experimented with this and found it to be fact - for example the Linux kernel itself will give ENAMETOOLONG when syscalls are done with paths longer than PATH_MAX.

You can have paths that cannot be accessed with an absolute pathname, because the syscall will reject it, and still access them if you use syscalls based on directory handles. This is one reason zig's std lib has been transitioning to use directory handles for relevant operations.

The article says it could be a

filesystem limitation, or at the very least a kernel parameter

Admittedly I did all my testing on ext4/linux with default kernel parameters, however, I'd like to see evidence / a demonstration of this before proceeding further.

@andrewrk
Copy link
Member

Oops, I misunderstood that the entire article was on this topic and so only read 1 heading. Now I finished reading the article, and it says:

On Linux, system calls like open(2) perform this check via getname_flags(). The check is performed using the 4096 byte PATH_MAX value, which, as I just finished explaining, is not really a file path limit!

This is what I already believed to be true, and it means zig's PATH_MAX (posix) and MAX_PATH_BYTES (posix+windows) are correct.

Now I will say that some std lib abstractions (the ones higher level than std.os) should be improved so that error.NameTooLong is no longer in the error set. That would be a welcome improvement to the std lib.

@andrewrk andrewrk changed the title MAX_PATH_BYTES is fiction improve higher level file system functions to not have error.NameTooLong in the error set Mar 25, 2020
@gereeter
Copy link
Contributor

Now I will say that some std lib abstractions (the ones higher level than std.os) should be improved so that error.NameTooLong is no longer in the error set.

It's not entirely clear to me what change you want here. If something passes a path to an operating system, it is possible to get NameTooLong. There isn't a huge amount you can do about that fact. You can sometimes avoid the error by breaking a path into components, performing multiple *at calls on the components, and looping, but even that should produce NameTooLong if an individual component is larger than _PC_PATH_MAX. If the path components come themselves from the operating system and filesystem, then I suppose it may be reasonable to assume that those path components (but not the paths themselves) can be safely given back to the operating system and so not produce NameTooLong, but that seems very rare and still not necessarily true. And path-reading functions that take a buffer still need to error if the buffer is too small.

Is this just about Alloc versions of functions that return paths like selfExePathAlloc, realpathAlloc, and getCwdAlloc then?


The bigger problem that I saw this issue being about was the fact that the name MAX_PATH_BYTES is misleading. It sounds like it is a bound on the size of a path and is used as such in places in the standard library (e.g. Dir.readLink and process.getCwd take pointers to arrays), but it isn't.

@andrewrk andrewrk changed the title improve higher level file system functions to not have error.NameTooLong in the error set improve higher level file system functions to avoid error.NameTooLong when possible Mar 26, 2020
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Mar 26, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@matu3ba
Copy link
Contributor

matu3ba commented May 5, 2022

List of TODOs (marked with TODO(#4812) in code):

  • realpath
  • realpathZ (added null-termination)
  • realpathW (windows only)
  • realpathAlloc
  • selfExePathAlloc
  • selfExeDirPathAlloc
  • realpathAlloc

I suspect that the workaround of getCwdAlloc of #4837 is usable, but symlink resolving requires some complexity. And abit more complexity for the fast-path check (absolute path).

test case canonicalize https://gist.github.com/matu3ba/1fd6bc69dc85705ffaed58363bdbb76e

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

4 participants