Open
Description
I have a large Github Action workflow that pushes over 600 images up to the GitHub Container Registry.
This mostly works fine, except that I have to set max-parallels based on how many images I expect to be running at a time, and even then sometimes I'm hitting APIs too fast or getting a rare error.
For example:
buildx failed with: ERROR: failed to solve: failed to compute cache key: failed to copy: httpReadSeeker: failed open: unexpected status code https://ghcr.io/v2/swiftotter/den-php-fpm/blobs/sha256:456f646c7993e2a08fbdcbb09c191d153118d0c8675e1a0b29f83895c425105f: 500 Internal Server Error - Server message: unknown
or
buildx failed with: ERROR: failed to solve: failed to compute cache key: failed to copy: read tcp 172.17.0.2:59588->185.199.111.154:443: read: connection timed out
or
buildx failed with: ERROR: failed to solve: failed to do request: Head "https://ghcr.io/v2/swiftotter/den-php-fpm-debug/blobs/sha256:d6b642fadba654351d3fc430b0b9a51f7044351daaed3d27055b19044d29ec66": dial tcp: lookup ghcr.io on 168.63.129.16:53: read udp 172.17.0.2:40862->168.63.129.16:53: i/o timeout
These are all temporary errors that disappear the moment I re-run the job. Instead, what I wish for is that in such cases - like timeouts, or server errors or too many requests errors, that some sort of automated backoff retry system exists, with configurable limitations.