Feature Request: Automated Recovery

I have a large Github Action workflow that pushes over 600 images up to the GitHub Container Registry.

This mostly works fine, except that I have to set max-parallels based on how many images I expect to be running at a time, and even then sometimes I'm hitting APIs too fast or getting a rare error.

For example:

```
buildx failed with: ERROR: failed to solve: failed to compute cache key: failed to copy: httpReadSeeker: failed open: unexpected status code https://ghcr.io/v2/swiftotter/den-php-fpm/blobs/sha256:456f646c7993e2a08fbdcbb09c191d153118d0c8675e1a0b29f83895c425105f: 500 Internal Server Error - Server message: unknown
```

or

```
buildx failed with: ERROR: failed to solve: failed to compute cache key: failed to copy: read tcp 172.17.0.2:59588->185.199.111.154:443: read: connection timed out
```

or

```
buildx failed with: ERROR: failed to solve: failed to do request: Head "https://ghcr.io/v2/swiftotter/den-php-fpm-debug/blobs/sha256:d6b642fadba654351d3fc430b0b9a51f7044351daaed3d27055b19044d29ec66": dial tcp: lookup ghcr.io on 168.63.129.16:53: read udp 172.17.0.2:40862->168.63.129.16:53: i/o timeout
```

These are all temporary errors that disappear the moment I re-run the job.  Instead, what I wish for is that in such cases - like timeouts, or server errors or too many requests errors, that some sort of automated backoff retry system exists, with configurable limitations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Automated Recovery #719

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Automated Recovery #719

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions