Skip to content

Commit

Permalink
[checks] wait for healthiness and termination (#16)
Browse files Browse the repository at this point in the history
* [checks] add a status OK (healthiness) checker

Also removes extra minute of wait, inbetween EC2 start and SSM testing

* Use a terminateWaiter

* Log the AMI used for testing

* Log when we start terminating

* Log as we terminate

* Add some padding

* Wait 5m for healthiness (it takes 3m)

* Update docs

* fix syntax
  • Loading branch information
kylos101 authored Feb 28, 2025
1 parent df6132c commit 9ed4045
Showing 5 changed files with 89 additions and 45 deletions.
91 changes: 59 additions & 32 deletions gitpod-network-check/README.md
Original file line number Diff line number Diff line change
@@ -55,38 +55,65 @@ A CLI to check if your network setup is suitable for the installation of Gitpod.

```console
./gitpod-network-check diagnose
INFO[0000] ✅ Main Subnets are valid
INFO[0000] ✅ Pod Subnets are valid
INFO[0000] ℹ️ Checking prerequisites
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ec2messages is configured
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssm is configured
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is configured
INFO[0001] ℹ️ Launching EC2 instance in a Main subnet
INFO[0007] ℹ️ Launching EC2 instance in a Pod subnet
INFO[0009] ℹ️ Waiting for EC2 instances to become ready (can take up to 2 minutes)
INFO[0167] ✅ EC2 Instances are now running successfully
INFO[0167] ℹ️ Connecting to SSM...
INFO[0175] ℹ️ Checking if the required AWS Services can be reached from the ec2 instances
INFO[0178] ✅ Autoscaling is available
INFO[0179] ✅ CloudFormation is available
INFO[0179] ✅ CloudWatch is available
INFO[0180] ✅ EC2 is available
INFO[0181] ✅ EC2messages is available
INFO[0182] ✅ ECR is available
INFO[0183] ✅ ECR Api is available
INFO[0184] ✅ EKS is available
INFO[0185] ✅ Elastic LoadBalancing is available
INFO[0185] ✅ KMS is available
INFO[0186] ✅ Kinesis Firehose is available
INFO[0187] ✅ SSM is available
INFO[0188] ✅ SSMmessages is available
INFO[0189] ✅ SecretsManager is available
INFO[0190] ✅ Sts is available
INFO[0190] ✅ DynamoDB is available
INFO[0191] ✅ S3 is available
INFO[0194] ✅ accounts.google.com is available
INFO[0194] ✅ github.com is available
INFO[0194] ✅ Instances terminated
INFO[0000] ℹ️ Running with region `eu-central-1`, main subnet `[subnet-0ed211f14362b224f subnet-041703e62a05d2024]`, pod subnet `[subnet-075c44edead3b062f subnet-06eb311c6b92e0f29]`, hosts `[accounts.google.com https://github.com]`, ami ``, and API endpoint ``
INFO[0000] ✅ Main Subnets are valid
INFO[0000] ✅ Pod Subnets are valid
INFO[0000] ℹ️ Checking prerequisites
INFO[0000] ℹ️ VPC endpoint com.amazonaws.eu-central-1.ec2messages is not configured, testing service connectivity...
INFO[0000] ✅ Service ec2messages.eu-central-1.amazonaws.com has connectivity
INFO[0000] ℹ️ VPC endpoint com.amazonaws.eu-central-1.ssm is not configured, testing service connectivity...
INFO[0000] ✅ Service ssm.eu-central-1.amazonaws.com has connectivity
INFO[0000] ℹ️ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is not configured, testing service connectivity...
INFO[0000] ✅ Service ssmmessages.eu-central-1.amazonaws.com has connectivity
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.execute-api is configured
INFO[0001] ✅ IAM role created and policy attached
INFO[0001] ℹ️ Launching EC2 instances in Main subnets
INFO[0001] ℹ️ Created security group with ID: sg-0784ba9ba1731f522
INFO[0002] ℹ️ Instance type t2.micro shall be used
INFO[0009] ℹ️ Created security group with ID: sg-088d7ea455ba271f5
INFO[0010] ℹ️ Instance type t2.micro shall be used
INFO[0011] ℹ️ Main EC2 instances: [i-00675f1d3d0162acb i-041d127c852b5c1ab]
INFO[0011] ℹ️ Launching EC2 instances in a Pod subnets
INFO[0012] ℹ️ Created security group with ID: sg-03575b98e15e8b184
INFO[0012] ℹ️ Instance type t2.micro shall be used
INFO[0014] ℹ️ Created security group with ID: sg-00d4a66a7840ebd67
INFO[0014] ℹ️ Instance type t2.micro shall be used
INFO[0016] ℹ️ Pod EC2 instances: [i-00e2b26e784c900c6 i-077cbced73ee64c1d]
INFO[0016] ℹ️ Waiting for EC2 instances to become Running (times out in 4 minutes)
INFO[0021] ℹ️ Waiting for EC2 instances to become Healthy (times out in 4 minutes)
INFO[0199] ✅ EC2 Instances are now running successfully
INFO[0199] ℹ️ Connecting to SSM...
INFO[0199] ℹ️ Checking if the required AWS Services can be reached from the ec2 instances in the pod subnet
INFO[0201] ✅ Autoscaling is available
INFO[0202] ✅ CloudFormation is available
INFO[0203] ✅ CloudWatch is available
INFO[0204] ✅ EC2 is available
INFO[0205] ✅ EC2messages is available
INFO[0206] ✅ ECR is available
INFO[0206] ✅ ECR Api is available
INFO[0207] ✅ EKS is available
INFO[0209] ✅ Elastic LoadBalancing is available
INFO[0210] ✅ KMS is available
INFO[0211] ✅ Kinesis Firehose is available
INFO[0212] ✅ SSM is available
INFO[0212] ✅ SSMmessages is available
INFO[0214] ✅ SecretsManager is available
INFO[0215] ✅ Sts is available
INFO[0215] ℹ️ Checking if certain AWS Services can be reached from ec2 instances in the main subnet
INFO[0216] ✅ DynamoDB is available
INFO[0217] ✅ S3 is available
INFO[0217] ℹ️ Checking if hosts can be reached with HTTPS from ec2 instances in the main subnets
INFO[0218] ✅ accounts.google.com is available
INFO[0219] ✅ https://github.com is available
INFO[0219] ℹ️ Terminating EC2 instances
INFO[0219] ℹ️ Waiting for EC2 instances to Terminate (times out in 4 minutes)
INFO[0304] ✅ Instances terminated
INFO[0305] ✅ Role 'GitpodNetworkCheck' deleted
INFO[0305] ✅ Instance profile deleted
INFO[0305] ✅ Security group 'sg-0784ba9ba1731f522' deleted
INFO[0306] ✅ Security group 'sg-088d7ea455ba271f5' deleted
INFO[0306] ✅ Security group 'sg-03575b98e15e8b184' deleted
INFO[0306] ✅ Security group 'sg-00d4a66a7840ebd67' deleted
```

3. Clean up after network diagnosis
19 changes: 13 additions & 6 deletions gitpod-network-check/cmd/checks.go
Original file line number Diff line number Diff line change
@@ -82,14 +82,23 @@ var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
log.Infof("ℹ️ Pod EC2 instances: %v", podInstanceIds)
InstanceIds = append(InstanceIds, podInstanceIds...)

log.Infof("ℹ️ Waiting for EC2 instances to become ready (can take up to 2 minutes)")
waiter := ec2.NewInstanceRunningWaiter(ec2Client, func(irwo *ec2.InstanceRunningWaiterOptions) {
log.Infof("ℹ️ Waiting for EC2 instances to become Running (times out in 4 minutes)")
runningWaiter := ec2.NewInstanceRunningWaiter(ec2Client, func(irwo *ec2.InstanceRunningWaiterOptions) {
irwo.MaxDelay = 15 * time.Second
irwo.MinDelay = 5 * time.Second
})
err = waiter.Wait(cmd.Context(), &ec2.DescribeInstancesInput{InstanceIds: InstanceIds}, *aws.Duration(4 * time.Minute))
err = runningWaiter.Wait(cmd.Context(), &ec2.DescribeInstancesInput{InstanceIds: InstanceIds}, *aws.Duration(4 * time.Minute))
if err != nil {
return fmt.Errorf("❌ Nodes never got ready: %v", err)
return fmt.Errorf("❌ Nodes never got Running: %v", err)
}
log.Infof("ℹ️ Waiting for EC2 instances to become Healthy (times out in 5 minutes)")
waitstatusOK := ec2.NewInstanceStatusOkWaiter(ec2Client, func(isow *ec2.InstanceStatusOkWaiterOptions) {
isow.MaxDelay = 15 * time.Second
isow.MinDelay = 5 * time.Second
})
err = waitstatusOK.Wait(cmd.Context(), &ec2.DescribeInstanceStatusInput{InstanceIds: InstanceIds}, *aws.Duration(5 * time.Minute))
if err != nil {
return fmt.Errorf("❌ Nodes never got Healthy: %v", err)
}
log.Info("✅ EC2 Instances are now running successfully")

@@ -99,8 +108,6 @@ var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
return fmt.Errorf("❌ could not connect to SSM: %w", err)
}

time.Sleep(time.Minute)

log.Infof("ℹ️ Checking if the required AWS Services can be reached from the ec2 instances in the pod subnet")
serviceEndpoints := map[string]string{
"SSM": fmt.Sprintf("https://ssm.%s.amazonaws.com", networkConfig.AwsRegion),
18 changes: 14 additions & 4 deletions gitpod-network-check/cmd/common.go
Original file line number Diff line number Diff line change
@@ -66,17 +66,27 @@ func cleanup(ctx context.Context, svc *ec2.Client, iamsvc *iam.Client) {
}

if len(InstanceIds) > 0 {
log.Info("ℹ️ Terminating EC2 instances")
_, err := svc.TerminateInstances(ctx, &ec2.TerminateInstancesInput{
InstanceIds: InstanceIds,
})
if err != nil {
log.WithError(err).WithField("instanceIds", InstanceIds).Warnf("Failed to cleanup instances, please cleanup manually")
}

log.Info("✅ Instances terminated")

log.Info("Cleaning up: Waiting for 2 minutes so network interfaces are deleted")
time.Sleep(2 * time.Minute)
terminateWaiter := ec2.NewInstanceTerminatedWaiter(svc, func(itwo *ec2.InstanceTerminatedWaiterOptions) {
itwo.MaxDelay = 15 * time.Second
itwo.MinDelay = 5 * time.Second
})
log.Info("ℹ️ Waiting for EC2 instances to Terminate (times out in 4 minutes)")
err = terminateWaiter.Wait(ctx, &ec2.DescribeInstancesInput{InstanceIds: InstanceIds}, *aws.Duration(4 * time.Minute))
if err != nil {
log.WithError(err).Warn("Failed to wait for instances to terminate")
log.Warn("Waiting 2 minutes so network interfaces are deleted")
time.Sleep(2 * time.Minute)
} else {
log.Info("✅ Instances terminated")
}
}

if len(Roles) == 0 {
2 changes: 1 addition & 1 deletion gitpod-network-check/cmd/root.go
Original file line number Diff line number Diff line change
@@ -93,7 +93,7 @@ func init() {
networkCheckCmd.PersistentFlags().StringVar(&networkConfig.InstanceAMI, "instance-ami", "", "Custom ec2 instance AMI id, if not set will use latest ubuntu")
networkCheckCmd.PersistentFlags().StringVar(&networkConfig.ApiEndpoint, "api-endpoint", "", "The Gitpod Enterprise control plane's regional API endpoint subdomain")
bindFlags(networkCheckCmd, v)
log.Infof("ℹ️ Running with region `%s`, main subnet `%v`, pod subnet `%v`, hosts `%v`, and api endpoint `%v`", networkConfig.AwsRegion, networkConfig.MainSubnets, networkConfig.PodSubnets, networkConfig.HttpsHosts, networkConfig.ApiEndpoint)
log.Infof("ℹ️ Running with region `%s`, main subnet `%v`, pod subnet `%v`, hosts `%v`, ami `%v`, and API endpoint `%v`", networkConfig.AwsRegion, networkConfig.MainSubnets, networkConfig.PodSubnets, networkConfig.HttpsHosts, networkConfig.InstanceAMI, networkConfig.ApiEndpoint)
}

func readConfigFile() *viper.Viper {
4 changes: 2 additions & 2 deletions gitpod-network-check/gitpod-network-check.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
log-level: debug # Options: debug, info, warning, error
region: eu-central-1
main-subnets: subnet-03ed4c7f3f10ee64a, subnet-03ae0d9e3ad063d83
pod-subnets: subnet-09704642a44a1ae9b, subnet-0fc43a731956656cd
main-subnets: subnet-0ed211f14362b224f, subnet-041703e62a05d2024
pod-subnets: subnet-075c44edead3b062f, subnet-06eb311c6b92e0f29
https-hosts: accounts.google.com, https://github.com
# put your custom ami id here if you want to use it, otherwise it will using latest ubuntu AMI from aws
instance-ami:

0 comments on commit 9ed4045

Please sign in to comment.