Skip to content

MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency #4996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2025

Conversation

dkhater-redhat
Copy link
Contributor

@dkhater-redhat dkhater-redhat commented Apr 22, 2025

- What I did
Updated the reconciler so that only changes to kernel arguments, OSImageURL, or extension bundles will trigger a new image build and node reboot. All other MachineConfig changes now reuse the existing MOSB and not cause a node reboot.

- How to verify it
Apply an MC that only tweaks a live-patchable field, observe new MOSB creation (with same image) no node drain and reboot.
Apply an MC that changes kernel args, OSImageURL, or adds/removes an extension, observe new MOSB creation (with new image), node drain and reboot.

- Description for the changelog

@dkhater-redhat dkhater-redhat marked this pull request as draft April 22, 2025 15:37
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 22, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2025
@dkhater-redhat dkhater-redhat changed the title Change mcdiffb DRAFT MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency Apr 23, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 23, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 23, 2025

@dkhater-redhat: This pull request references MCO-1580 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 7 times, most recently from 8452b7a to 32555ad Compare April 28, 2025 17:22
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2025
@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 3 times, most recently from ef4dc03 to 621dc6c Compare April 28, 2025 23:16
@dkhater-redhat dkhater-redhat marked this pull request as ready for review April 28, 2025 23:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2025
@dkhater-redhat dkhater-redhat changed the title DRAFT MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency Apr 28, 2025
@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 3 times, most recently from db06f06 to 8313511 Compare April 29, 2025 15:18
@dkhater-redhat dkhater-redhat force-pushed the change-mcdiffb branch 2 times, most recently from 451b199 to 01d220a Compare June 17, 2025 18:38
@dkhater-redhat
Copy link
Contributor Author

dkhater-redhat commented Jun 17, 2025

@cheesesashimi or @umohnani8 is going to give the final LGTM on this. We have agreed to override tests if needed, as the ocl test has passed (many times) locally, and I have gotten consistent e2e-gcp passes in previous iterations of this PR

…g and reusing MOSB's with updated rendered spec
@umohnani8
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2025
@dkhater-redhat
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2025
@yuqi-zhang
Copy link
Contributor

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD b3377c9 and 2 for PR HEAD 97cecae in total

@@ -1561,20 +1419,17 @@ func newMachineConfigDiff(oldConfig, newConfig *mcfgv1.MachineConfig) (*machineC
units: !reflect.DeepEqual(oldIgn.Systemd.Units, newIgn.Systemd.Units),
kernelType: canonicalizeKernelType(oldConfig.Spec.KernelType) != canonicalizeKernelType(newConfig.Spec.KernelType),
extensions: !(extensionsEmpty || reflect.DeepEqual(oldConfig.Spec.Extensions, newConfig.Spec.Extensions)),
}, nil
}
oclEnabled: (oldOCLImage != "" || newOCLImage != "") || (oldOCLImage != "" && newOCLImage != ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first block and second block here seems redundant? i.e. if either of oldOCL or newOCL is not empty we consider it enabled already, what's the second check for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget why that was added. I pulled the code down and ran the tests locally and we can probably do away with the second part of this check (the oldOCLImage != "" && newOCLImage != "" section).

@yuqi-zhang
Copy link
Contributor

/override ci/prow/e2e-gcp-op-single-node
/override ci/prow/e2e-gcp-op

Those tests are known failures with gcp infra, so overriding to reduce retests

/hold

@cheesesashimi please unhold if you're happy with the state of the PR

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2025
Copy link
Contributor

openshift-ci bot commented Jun 18, 2025

@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-gcp-op, ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node
/override ci/prow/e2e-gcp-op

Those tests are known failures with gcp infra, so overriding to reduce retests

/hold

@cheesesashimi please unhold if you're happy with the state of the PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

return maybeAddMachineConfigInfo(ccAnnotation, err)
}
// We might be able to drop this method entirely since I'm not sure if it's actually needed.
func (dn *Daemon) triggerUpdateWithMachineConfig(currentConfig, desiredConfig *mcfgv1.MachineConfig, _ bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): Remove the bool parameter since it's not being used.

@@ -1561,20 +1419,17 @@ func newMachineConfigDiff(oldConfig, newConfig *mcfgv1.MachineConfig) (*machineC
units: !reflect.DeepEqual(oldIgn.Systemd.Units, newIgn.Systemd.Units),
kernelType: canonicalizeKernelType(oldConfig.Spec.KernelType) != canonicalizeKernelType(newConfig.Spec.KernelType),
extensions: !(extensionsEmpty || reflect.DeepEqual(oldConfig.Spec.Extensions, newConfig.Spec.Extensions)),
}, nil
}
oclEnabled: (oldOCLImage != "" || newOCLImage != "") || (oldOCLImage != "" && newOCLImage != ""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget why that was added. I pulled the code down and ran the tests locally and we can probably do away with the second part of this check (the oldOCLImage != "" && newOCLImage != "" section).

return mcDiff, nil
// If OCL is enabled, compute OS update and revertFromOCL separately.
diff.osUpdate = oldOCLImage != newOCLImage || force
diff.revertFromOCL = newOCLImage == ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): We can tighten up this check by doing something like diff.revertFromOCL = newOCLImage == "" && oldOCLImage != "".


// Delete the third MOSB and the image should be deleted also
t.Logf("Deleting MachineOSBuild %q", thirdMOSB.Name)
// waitForMOSCToGetNewPullspec(ctx, t, cs, moscName, secondImagePullspec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (non-blocking): If this isn't being used, it's better to delete the line and let Git worry about what was once here.

@cheesesashimi
Copy link
Member

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2025
@cheesesashimi
Copy link
Member

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Jun 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, dkhater-redhat, umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cheesesashimi,dkhater-redhat,umohnani8]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD a12b62b and 1 for PR HEAD 97cecae in total

@yuqi-zhang
Copy link
Contributor

/override ci/prow/e2e-gcp-op-single-node
/override ci/prow/e2e-gcp-op

Reapplying

Copy link
Contributor

openshift-ci bot commented Jun 18, 2025

@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-gcp-op, ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node
/override ci/prow/e2e-gcp-op

Reapplying

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dkhater-redhat
Copy link
Contributor Author

/test unit

Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

@dkhater-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-techpreview c827469 link false /test e2e-hypershift-techpreview
ci/prow/e2e-azure-ovn-upgrade-out-of-change 97cecae link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-techpreview 97cecae link false /test e2e-gcp-op-techpreview
ci/prow/e2e-aws-ovn-upgrade-out-of-change 97cecae link false /test e2e-aws-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-ocl 97cecae link false /test e2e-gcp-op-ocl
ci/prow/e2e-aws-ovn-windows 97cecae link false /test e2e-aws-ovn-windows

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD a12b62b and 2 for PR HEAD 97cecae in total

@openshift-merge-bot openshift-merge-bot bot merged commit a4ec776 into openshift:main Jun 19, 2025
14 of 19 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.20.0-202506190412.p0.ga4ec776.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants