Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for clusters with trn1 instance types #5906

Merged
merged 3 commits into from
Nov 17, 2022
Merged

Conversation

dewjam
Copy link
Contributor

@dewjam dewjam commented Nov 11, 2022

Description

Per #5896, TRN1 instance types are now available in us-west-2 and us-east-1. This PR ensures Trn1 instance types will be launched with the EKS Accelerated AMI by default and also will install the Neuron Device Plugin by default. This also upgrades the Neuron Device Plugin to version 1.9.3.0 2.1.2.0 which is required for Trn1 support.

Example:

./eksctl create cluster \
    --name trainium \
    --region us-west \
    --zones us-west-2b,us-west-2d \
    --nodegroup-name trainium-2 \
    --node-type trn1.2xlarge \
    --node-zones us-west-2d \
    --nodes 1 \
    --nodes-min 1 \
    --nodes-max 4 \
    --ssh-access \
    --with-oidc
% k get pods -A
NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE
kube-system   aws-node-zwjbd                         1/1     Running   0          12m
kube-system   coredns-57ff979f67-bc6tl               1/1     Running   0          22m
kube-system   coredns-57ff979f67-xg28h               1/1     Running   0          22m
kube-system   kube-proxy-vdj4w                       1/1     Running   0          12m
kube-system   neuron-device-plugin-daemonset-p872b   1/1     Running   0          10m

Checklist

  • Added tests that cover your change (if possible)
  • [] Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes
  • (Core team) Added labels for change area (e.g. area/nodegroup) and kind (e.g. kind/improvement)

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 馃く

  • Backfilled missing tests for code in same general area 馃帀
  • Refactored something and made the world a better place 馃専

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello dewjam 馃憢 Thank you for opening a Pull Request in eksctl project. The team will review the Pull Request and aim to respond within 1-10 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@Himangini Himangini added the kind/feature New feature or request label Nov 14, 2022
@Himangini Himangini changed the title add support for launching clusters with trn1 instance types Support for clusters with trn1 instance types Nov 14, 2022
@dewjam
Copy link
Contributor Author

dewjam commented Nov 15, 2022

Hey @Himangini
Tests should be fixed and I also rebased against main. Any chance I can get another review?

Copy link
Collaborator

@cPu1 cPu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments for improving the validation and the integration tests.

integration/tests/trainium/trainium_test.go Outdated Show resolved Hide resolved
pkg/apis/eksctl.io/v1alpha5/validation.go Outdated Show resolved Hide resolved
@dewjam
Copy link
Contributor Author

dewjam commented Nov 17, 2022

Updated tests. This is ready for review again. Thanks!

@dewjam dewjam requested review from Himangini and removed request for cPu1 November 17, 2022 03:37
Copy link
Collaborator

@cPu1 cPu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cPu1 cPu1 enabled auto-merge November 17, 2022 09:18
@cPu1 cPu1 merged commit 65e9fcf into eksctl-io:main Nov 17, 2022
@@ -238,20 +234,24 @@ func getAvailabilityZones(ctx context.Context, ec2API awsapi.EC2) (string, strin
Expect(err).NotTo(HaveOccurred())
zones := output.AvailabilityZones

zoneMap := map[string]struct{}{}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, is map[string]struct{}{} a style thing? Or is it more efficient than map[string]bool for this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more efficient, takes fewer bytes, and is more idiomatic when you're using a map as a set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants