You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at the nix-index source code, the file is either uncompressed, brotli-compressed, or xz-compressed: https://github.com/bennofs/nix-index/blob/master/src/hydra.rs#L201
The example above is just a JSON with the file structure: curl -L -o - http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.ls | brotli -d
The text was updated successfully, but these errors were encountered:
Eris is currently stateless, so this requires a (recursive!) filesystem query for every .ls request to stat everything, in order to build the JSON representation. (Recursive stat is exactly how Nix itself builds this representation; lookup listNar in the Nix source code.)
This approach is "fine" for Hydra because it can generate these indexes at the time it builds the derivation, just before uploading them into the S3 bucket: the cost to query once the file has been built is zero. That isn't the case for us. stat is, generally, going to be very expensive -- mostly because it has to access the filesystem metadata blocks to get at the information, and on most filesystems that isn't going to be cheap (e.g. most metadata isn't going to fit in cache lines, things like that), and in fact many times probably implies locking somewhere. And in practice it's going to be very bad on any derivation that has a "shallow" directories with many, many files inside, which will be one of the worst cases. I wouldn't be surprised if you could easily DDOS/lockup Eris in such instances with repeated queries by just consuming all the HTTP worker threads with IO stalls.
This request is quite reasonable on the face of it all, but probably quite hard to implement well -- I expect this to be very, very high overhead per .ls request (meaning nix-index is probably going to thrash the hell out of it), and I don't see a good way to immediately implement it.
One way to do this that might not lock up the HTTP worker threads could be to asynchronously launch a task that performs this lookup in the background, and passes data back/notifies the worker thread responsible for the client when it's done, so it can return the info. This means that the actual worker threads for the Mojo server can handle other requests in the mean time. But on the other hand, it doesn't solve the underlying problem which is that simply querying this information on-demand simply isn't very efficient in the first place, and has other issues (how many processes should you leave outstanding at once? Should you allow bursting e.g. with token buckets? What happens if enough processes just eat up all your IOPS anyway and everything grinds to a halt? Etc etc.)
Proposed solution
The upstream nixos cache provides an .ls endpoint which contain directory indexes for the nar files.
Tools like nix-index use that. It would be convenient to have the same funcitonality in eris so nix-index can be extended to fetch multiple caches.
Example URL: http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.ls is a listing of http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.narinfo
Alternatives considered
Don't implement it I guess
Additional context
Looking at the nix-index source code, the file is either uncompressed, brotli-compressed, or xz-compressed: https://github.com/bennofs/nix-index/blob/master/src/hydra.rs#L201
The example above is just a JSON with the file structure:
curl -L -o - http://cache.nixos.org/wi96xcbm63zccfxi5f648b9pkak9d62k.ls | brotli -d
The text was updated successfully, but these errors were encountered: