Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache release checksum fetches for older files unlikely to change #75

Closed
alexheretic opened this issue Feb 14, 2024 · 2 comments · Fixed by #76
Closed

Cache release checksum fetches for older files unlikely to change #75

alexheretic opened this issue Feb 14, 2024 · 2 comments · Fixed by #76

Comments

@alexheretic
Copy link
Contributor

When performing something like a cargo update operation, gitlab-cargo-shim will

  1. Fetch all packages via /projects/{}/packages (calling multiple times if there are multiple pages)
  2. Fetch each package release checksum via /projects/{}/packages/{}/package_files
  3. Fetch each release's metadata via /projects/{}/packages/generic/{}/{}/{}

This activity is the source of most of the latency currently as these can take a while. Particularly the first activity after startup:

INFO ssh: gitlab_cargo_shim: Successfully authenticated for GitLab user `alexheretic` by Build or Personal Token
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate releases in 8.8s
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate metadata in 23.1s

Note: Using the latency logs introduced in #74.

However, metadata fetches (3.) are cached and so fast on subsequent operations:

INFO ssh: gitlab_cargo_shim: Successfully authenticated for GitLab user `alexheretic` by Build or Personal Token
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate releases in 6.3s
INFO ssh:data:build_packfile: gitlab_cargo_shim: Fetched crate metadata in 3.2ms

This makes me wonder if we can also improve 1. & 2. with caching. I think for 1. the answer is "no". The server needs to provide new releases that may have been added since the last call.

But for 2. there is perhaps more that can be done. This checksum won't generally change for a given release. It can happen if the release has been re-published overwriting the previous file, which is possible. However, in my case publishing is as immutable as possible, simulating crates-io. The most likely time I might re-publish would be close to the original publish time to fix some error.

That suggests we could cache checksum fetches for releases older than some configurable period of time as older releases are much less likely to be modified. E.g. config:

# configuration

## Cache file checksum fetches for all release older than this value
## If omitted no caching will occur.
cache-releases-older-than = "7 days"
@w4
Copy link
Owner

w4 commented Feb 23, 2024

This sounds reasonable to me, I have some unreleased changes that need clearing up to move the cache to rocksdb too to avoid boot latency.

I've also been experimenting with the GraphQL API for some unrelated things wrt. group-level registries, might be able to use GraphQL here too for some cool things with after to avoid recursing the entirety of packages for packages that have already been seen

@alexheretic
Copy link
Contributor Author

I've rebased #76 so should be good to go.

I have some unreleased changes that need clearing up to move the cache to rocksdb too to avoid boot latency.

I've also been thinking about a way to startup with a pre-filled metadata (and now checksum) cache. That would be very helpful for my use case to reduce first request latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants