Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to iPXE script when fetching files #333

Closed
mmlb opened this issue Mar 25, 2021 · 5 comments · Fixed by #432
Closed

Add retries to iPXE script when fetching files #333

mmlb opened this issue Mar 25, 2021 · 5 comments · Fixed by #432
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@mmlb
Copy link
Contributor

mmlb commented Mar 25, 2021

The iPXE file fetches can run into temporary network issues when downloading the kernel/initramfs files.
We should add some retry logic.

Expected Behaviour

Temporary network failures do not cause the iPXE boot to fail.

Current Behaviour

iPXE boot will fail if there's a network issue.

@mmlb mmlb added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 25, 2021
@tstromberg tstromberg added kind/bug Categorizes issue or PR as related to a bug. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Aug 27, 2021
@tstromberg
Copy link
Contributor

Can someone identify which line in boots is responsible for the fetching that is missing retry?

It would help the person who might want to take this issue on.

@jacobweinstock
Copy link
Member

jacobweinstock commented Sep 1, 2021

If I'm not mistaken, @mmlb is talking about this area in job/ipxe.go, which is in the function serveBootScript.

@mmlb
Copy link
Contributor Author

mmlb commented Sep 2, 2021

Nope, its actually in the installers: https://github.com/tinkerbell/boots/blob/ba3a3fef424ebfd7125b08ae99dcb9631bc911a8/installers/osie/main.go#L54 We'd probably need a bounded loop that breaks on success.
And similar for fetching the initrd and posting events back.

@tstromberg
Copy link
Contributor

If it helps, I've had good luck with using this library for exponential backoff+jitter, and have seen it used elsewhere in Tinkerbell (tink?): https://pkg.go.dev/github.com/cenkalti/backoff/v4

@mmlb
Copy link
Contributor Author

mmlb commented Sep 3, 2021

The http request is not from boot's side. Its ipxe doing the fetches we'd need to do the retries/backoff in the iPXE script.

@mmlb mmlb added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Feb 8, 2022
@jacobweinstock jacobweinstock transferred this issue from tinkerbell/smee Dec 24, 2022
@jacobweinstock jacobweinstock transferred this issue from tinkerbell/ipxedust Dec 24, 2022
@jacobweinstock jacobweinstock transferred this issue from tinkerbell/smee Apr 7, 2023
@jacobweinstock jacobweinstock transferred this issue from tinkerbell/ipxedust Jun 3, 2023
@mergify mergify bot closed this as completed in #432 May 13, 2024
mergify bot added a commit that referenced this issue May 13, 2024
## Description


This will help overcome transient network issues.

## Why is this needed



Fixes: #333

## How Has This Been Tested?





## How are existing users impacted? What migration steps/scripts do we need?





## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants