-
Notifications
You must be signed in to change notification settings - Fork 438
OCPBUGS-57426: OCPBUGS-57506: Boot image controller should correctly handle marketplace boot images #5122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-57426: OCPBUGS-57506: Boot image controller should correctly handle marketplace boot images #5122
Conversation
@djoshy: An error was encountered querying GitHub for users with public email (sregidor@redhat.com) for bug OCPBUGS-57426 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details. Full error message.
Post "http://ghproxy/graphql": dial tcp 172.30.229.2:80: connect: connection refused
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ca83084
to
6e1afd9
Compare
/retest-required /payload-job-with-prs openshift/origin#29919 periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-serial |
if newami != currentAMI { | ||
// Validate that we're allowed to update from the current AMI | ||
if !AllowedAMIs.Has(currentAMI) { | ||
return false, nil, fmt.Errorf("current AMI %s is unknown, boot image management is not supported on this cluster. Please disable this feature to remove the degrade", currentAMI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, since we're keeping bootimage opt-out for AWS, this means that a marketplace cluster will automatically opt in once upgraded to 4.19, and then immediately degrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I think our current plan is to handle the upgrade awareness via admin-ack docs(openshift/openshift-docs#94706). I suppose we could skip the MachineSet
and not degrade(maybe degrade in a later y version?)
/retest-required /jira refresh |
@djoshy: This pull request references Jira Issue OCPBUGS-57426, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
6e1afd9
to
79fc2e6
Compare
|
||
// AllowedAMIs contains the set of AMI IDs that are allowed to be updated from | ||
var AllowedAMIs = sets.New( | ||
"ami-000145e5a91e9ac22", "ami-0001c9b25b9ca6731", "ami-000277c401a2db73a", "ami-0002c5529c6fff5a4", "ami-00033b241162023da", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't dug in to see how this is generated, but you might want to wrap at one-per-line, just so new-AMI-insertion generates a single-line diff, and not a many-line-re-wrapping diff.
aa94349
to
abeb602
Compare
abeb602
to
5e332d5
Compare
This is unfortunate but yeah, I think a list like this is probably the safest/most comprehensive approach. Two general comments though:
If not, and this list is meant to be ongoing, my concern is that it'll be extremely easy for this list to become out of date here. I would propose instead having it live in openshift/installer, since that's where bootimages are bumped (and we could even have CI checks that verify the list is updated). Note that bootimages are bumped multiple times for a minor lifecycle. |
Verified using IPI on AWS and IPI on GCP We created a new test case to test this scenario in polarion Manually tested scenarios Positive:
Negative:
We also updated the existing test cases automation so that they can handle this scenario too. All automated e2e test cases passed in both platforms. We can qe-approve this PR. Nevertheless, since it seems that there is still an ongoing conversation about how to create/store/maintain the list of images we will wait until everything is decided, just in case we need extra testing. |
Thanks for testing, @sergiordlr !
So, this list was originally compiled by @sdodson and he mentioned that he'll work on automating it. This was quickly put together targeting a fix for 4.19.2, so I'm happy to consider a different solution going forward in 4.20+
IMO this sounds the best approach to me, I'll be happy to assist as needed. |
I guess one thing that helps here is that (and let me know if this is incorrect) it's fine if e.g. we add an AMI to the installer and don't immediately update the list here. Until the list is updated, the MCO will avoid managing the image, but once we add it, it will. And e.g. 4.19 bootimages will be good for a while so it doesn't have to be managed immediately. And so adding to the list doesn't have to be that closely in sync but could be done e.g. once a release or so. But that said, another way to do this which could be interesting is to have the script run at MCO build time and dynamically generate the list. That way it's self-maintained. Probably saner to have it generate JSON that gets embedded in the image and then the bootimage controller can just read that. But what's nice is that this script then also gives us the opportunity to gather other things that may help with bootimage updates, like the associated OCP minor release (based on the oldest git branch that contains the AMI ID). Anyway, not blocking this to be clear. /approve |
This is correct.
This is an interesting approach, let me see if I can work something up 👍 CI is broken at the moment and we should also block this on openshift/origin#29919 so we don't break component readiness; throwing in a preemptive hold. /hold |
I think long term a hardcoded MCO list is not feasible, but we'd like to have this as a temporary fix to mitigate any 4.19 clusters upgrading. Having something in installer that we can consume should work, I think we floated some other ideas, happy to discuss via an image mode sync or similar. |
/unhold /retest-required /cherrypick release-4.19 |
@djoshy: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test all |
/retest-required |
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
/label qe-approved |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, isabella-janssen, jlebon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0fd8575
into
openshift:main
@djoshy: Jira Issue OCPBUGS-57426: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-57426 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: new pull request created: #5135 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
Closes OCPBUGS-57348 and OCPBUGS-57426.
The MCO will now only update boot images that have been published previously in the installer.
When an unexpected boot image is found, the controller will skip the MachineSet for updates.