Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide .ls endpoint #2

Open
dasJ opened this issue Jan 14, 2020 · 2 comments
Open

Provide .ls endpoint #2

dasJ opened this issue Jan 14, 2020 · 2 comments
Labels
A-enhancement New feature or request E-hard Hard issues P-low Low priority issues
Milestone

Comments

@dasJ
Copy link
Contributor

dasJ commented Jan 14, 2020

Proposed solution

The upstream nixos cache provides an .ls endpoint which contain directory indexes for the nar files.
Tools like nix-index use that. It would be convenient to have the same funcitonality in eris so nix-index can be extended to fetch multiple caches.
Example URL: http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.ls is a listing of http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.narinfo

Alternatives considered

Don't implement it I guess

Additional context

Looking at the nix-index source code, the file is either uncompressed, brotli-compressed, or xz-compressed: https://github.com/bennofs/nix-index/blob/master/src/hydra.rs#L201
The example above is just a JSON with the file structure: curl -L -o - http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.ls | brotli -d

@dasJ dasJ added the A-enhancement New feature or request label Jan 14, 2020
@thoughtpolice
Copy link
Owner

thoughtpolice commented Jan 20, 2020

Eris is currently stateless, so this requires a (recursive!) filesystem query for every .ls request to stat everything, in order to build the JSON representation. (Recursive stat is exactly how Nix itself builds this representation; lookup listNar in the Nix source code.)

This approach is "fine" for Hydra because it can generate these indexes at the time it builds the derivation, just before uploading them into the S3 bucket: the cost to query once the file has been built is zero. That isn't the case for us. stat is, generally, going to be very expensive -- mostly because it has to access the filesystem metadata blocks to get at the information, and on most filesystems that isn't going to be cheap (e.g. most metadata isn't going to fit in cache lines, things like that), and in fact many times probably implies locking somewhere. And in practice it's going to be very bad on any derivation that has a "shallow" directories with many, many files inside, which will be one of the worst cases. I wouldn't be surprised if you could easily DDOS/lockup Eris in such instances with repeated queries by just consuming all the HTTP worker threads with IO stalls.

This request is quite reasonable on the face of it all, but probably quite hard to implement well -- I expect this to be very, very high overhead per .ls request (meaning nix-index is probably going to thrash the hell out of it), and I don't see a good way to immediately implement it.

@thoughtpolice thoughtpolice added E-hard Hard issues P-low Low priority issues labels Jan 20, 2020
@thoughtpolice
Copy link
Owner

thoughtpolice commented Jan 20, 2020

One way to do this that might not lock up the HTTP worker threads could be to asynchronously launch a task that performs this lookup in the background, and passes data back/notifies the worker thread responsible for the client when it's done, so it can return the info. This means that the actual worker threads for the Mojo server can handle other requests in the mean time. But on the other hand, it doesn't solve the underlying problem which is that simply querying this information on-demand simply isn't very efficient in the first place, and has other issues (how many processes should you leave outstanding at once? Should you allow bursting e.g. with token buckets? What happens if enough processes just eat up all your IOPS anyway and everything grinds to a halt? Etc etc.)

@thoughtpolice thoughtpolice added this to the Unknown milestone Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-enhancement New feature or request E-hard Hard issues P-low Low priority issues
Projects
None yet
Development

No branches or pull requests

2 participants