Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eks): Nodegroup support nodeRepairConfig #32626

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

phuhung273
Copy link
Contributor

Issue # (if applicable)

Closes #32562

Description of changes

  • EKS Nodegroup support nodeRepairConfig

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team December 21, 2024 17:02
@github-actions github-actions bot added repeat-contributor [Pilot] contributed between 3-5 PRs to the CDK effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 labels Dec 21, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@phuhung273 phuhung273 changed the title feat(eks): Nodegroup support nodeRepairConfig feat(eks): Nodegroup support nodeRepairConfig Dec 21, 2024
@aws-cdk-automation aws-cdk-automation dismissed their stale review December 21, 2024 17:07

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

Copy link

codecov bot commented Dec 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.38%. Comparing base (5687d85) to head (e3ce823).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #32626   +/-   ##
=======================================
  Coverage   82.38%   82.38%           
=======================================
  Files         120      120           
  Lines        6937     6937           
  Branches     1170     1170           
=======================================
  Hits         5715     5715           
  Misses       1119     1119           
  Partials      103      103           
Flag Coverage Δ
suite.unit 82.38% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
packages/aws-cdk ∅ <ø> (∅)
packages/aws-cdk-lib/core 82.38% <ø> (ø)
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Dec 21, 2024
@aaythapa aaythapa self-assigned this Feb 12, 2025
Comment on lines +33 to +37
new integ.IntegTest(app, 'aws-cdk-eks-nodegroup-repair-config', {
testCases: [stack],
// Test includes assets that are updated weekly. If not disabled, the upgrade PR will fail.
diffAssets: false,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any assertions we can add?

Copy link
Contributor Author

@phuhung273 phuhung273 Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try to. But seems like something wrong with EKS DescribeNodegroup API.

{
    "nodegroup": {
        "amiType": "AL2023_x86_64_STANDARD",
        "capacityType": "ON_DEMAND",
        "clusterName": "Cluster9EE0221C-36a4e31332254b13b24d08339cdfa144",
        "createdAt": "2025-02-14T01:27:04.073Z",
        "diskSize": 20,
        "health": {
            "issues": []
        },
        "instanceTypes": [
            "t3.micro"
        ],
        "labels": {},
        "modifiedAt": "2025-02-14T01:28:44.250Z",
        "nodeRole": "arn:aws:iam::<MY_ACCOUNT>:role/aws-cdk-eks-nodegroup-rep-ClusterNodegroupMNGAL2023-tVMCbaTW4nBh",
        "nodegroupArn": "arn:aws:eks:us-east-1:<MY_ACCOUNT>:nodegroup/Cluster9EE0221C-36a4e31332254b13b24d08339cdfa144/ClusterNodegroupMNGAL2023X8-yU4tahg9bhFu/a4ca8108-7f41-618c-4ad4-e50d6b9accf8",
        "nodegroupName": "ClusterNodegroupMNGAL2023X8-yU4tahg9bhFu",
        "releaseVersion": "1.31.4-20250203",
        "resources": {
            "autoScalingGroups": [
                {
                    "name": "eks-ClusterNodegroupMNGAL2023X8-yU4tahg9bhFu-a4ca8108-7f41-618c-4ad4-e50d6b9accf8"
                }
            ]
        },
        "scalingConfig": {
            "minSize": 1,
            "maxSize": 2,
            "desiredSize": 2
        },
        "status": "ACTIVE",
        "subnets": [
            "subnet-097df13eb8cffc6bd",
            "subnet-0578a7502899d27c8"
        ],
        "tags": {
            "aws:cloudformation:stack-id": "arn:aws:cloudformation:us-east-1:<MY_ACCOUNT>:stack/aws-cdk-eks-nodegroup-repair-config-test/e3f1b500-ea70-11ef-9c66-0e98b891b35b",
            "aws:cloudformation:stack-name": "aws-cdk-eks-nodegroup-repair-config-test",
            "aws:cloudformation:logical-id": "ClusterNodegroupMNGAL2023X8664STANDARD8BD0F7AB"
        },
        "updateConfig": {
            "maxUnavailable": 1
        },
        "version": "1.31"
    }
}

It doesn't have nodeRepairConfig as advertised in https://docs.aws.amazon.com/eks/latest/APIReference/API_DescribeNodegroup.html

Is it ok if we just assert status: Active ?

/**
* The node auto repair configuration for the node group.
*
* @see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-eks-nodegroup-noderepairconfig.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this doc explains node repair config better

* @see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-eks-nodegroup-noderepairconfig.html
* @default - disabled
*/
readonly nodeRepairConfig?: NodegroupRepairConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't enforce this CDK wide but I think in this case the property should be

readonly nodeRepairConfig?: boolean;

Reason is I don't suspect there will be any additional properties added to the NodegroupRepairConfig, you either enable auto repair or you don't, there's not much configuration you can add to that. So I'd prefer to make the property boolean and have one less interface.

@@ -333,6 +345,14 @@ export interface NodegroupOptions {
* @default undefined - node groups will update instances one at a time
*/
readonly maxUnavailablePercentage?: number;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding it to both modules!

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: e3ce823
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. repeat-contributor [Pilot] contributed between 3-5 PRs to the CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eks: Add support for Node Health Monitoring and Repair
3 participants