Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerate two subnets and Introduce Cleanup command #5

Merged
merged 14 commits into from
Jul 17, 2024

Conversation

kylos101
Copy link
Contributor

@kylos101 kylos101 commented Jul 17, 2024

Description

Some customers reuse subnets, but when this happens, it causes the checker to fail (because the security group already exists).

Also, while troubleshooting, I noticed that we're not tagging related resources, which makes it difficult to find and delete them later on.

Related Issue(s)

Fixes ENT-473

How to test

Setup a YAML file like so:

log-level: debug # Options: debug, info, warning, error
region: eu-central-1
main-subnets: subnet-0c2be6925d464ae0e, subnet-0ac7749ca3d2337b2
pod-subnets: subnet-0c2be6925d464ae0e, subnet-0ac7749ca3d2337b2

And then:

  1. go run . diagnose to test
  2. See that maybe some resources need to be cleaned up manually (like below, security groups lingered)
  3. go run . clean to find and remove lingering test resources
# run diagnose
gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . diagnose
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] ℹ️  Checking prerequisites                   
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ec2messages is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssm is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.execute-api is configured 
INFO[0001] ℹ️  Found duplicate subnets. We'll test each subnet only once, starting with main. 
INFO[0001] ℹ️  Launching EC2 instances in Main subnets  
INFO[0001] ℹ️  Created security group with ID: sg-0f28e2365912b13c3 
INFO[0008] ℹ️  Created security group with ID: sg-0a7ce352a5ef827d1 
INFO[0010] ℹ️  Launching EC2 instances in a Pod subnets 
WARN[0010] Subnet 'subnet-0c2be6925d464ae0e' was already launched, skipping 
WARN[0010] Subnet ' subnet-0ac7749ca3d2337b2' was already launched, skipping 
INFO[0010] ℹ️  Waiting for EC2 instances to become ready (can take up to 2 minutes) 
INFO[0035] ✅ EC2 Instances are now running successfully 
INFO[0035] ℹ️  Connecting to SSM...                     
INFO[0118] ℹ️  Checking if the required AWS Services can be reached from the ec2 instances 
INFO[0118] ✅ Autoscaling is available                   
INFO[0119] ✅ CloudFormation is available                
INFO[0119] ✅ CloudWatch is available                    
INFO[0120] ✅ EC2 is available                           
INFO[0121] ✅ EC2messages is available                   
INFO[0121] ✅ ECR is available                           
INFO[0122] ✅ ECR Api is available                       
INFO[0123] ✅ EKS is available                           
INFO[0124] ✅ Elastic LoadBalancing is available         
INFO[0124] ✅ KMS is available                           
INFO[0125] ✅ Kinesis Firehose is available              
INFO[0126] ✅ SSM is available                           
INFO[0126] ✅ SSMmessages is available                   
INFO[0127] ✅ SecretsManager is available                
INFO[0128] ✅ Sts is available                           
INFO[0128] ✅ DynamoDB is available                      
INFO[0129] ✅ S3 is available                            
INFO[0129] Cleaning up: Waiting for 2 minutes so network interfaces are deleted 
INFO[0249] ✅ Instances terminated                       
INFO[0250] ✅ Role 'GitpodNetworkCheck' deleted          
INFO[0250] ✅ Instance profile deleted                   
WARN[0250] Failed to clean up security group, please cleanup manually  error="operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: 772c9f27-db47-4966-965f-9d9ed2f70771, api error DependencyViolation: resource sg-0f28e2365912b13c3 has a dependent object" securityGroup=sg-0f28e2365912b13c3
WARN[0251] Failed to clean up security group, please cleanup manually  error="operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: 1c2b3d0d-b617-4e5b-bacb-de6958ec7cdf, api error DependencyViolation: resource sg-0a7ce352a5ef827d1 has a dependent object" securityGroup=sg-0a7ce352a5ef827d1

# run clean to find and remove left behind resources
gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . clean
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] No instances found.                          
INFO[0000] No roles found.                              
INFO[0000] ✅ Security group 'sg-0f28e2365912b13c3' deleted 
INFO[0001] ✅ Security group 'sg-0a7ce352a5ef827d1' deleted 

# run clean again, to assert everything is gone.
gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . clean
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] No instances found.                          
INFO[0000] No roles found.                              
INFO[0000] No security groups found. 

Documentation

/hold

@kylos101 kylos101 marked this pull request as ready for review July 17, 2024 00:38
@kylos101 kylos101 requested a review from nandajavarma July 17, 2024 00:39
@kylos101
Copy link
Contributor Author

@nandajavarma let me know what you think on the change? I'm happy to add docs, but wanted to get feedback first.

@nandajavarma
Copy link
Collaborator

@kylos101 Thanks a lot for making this change Kyle!! ❤️ The code looks good and really happy to see the improved usability now! ⭐
If you can add a line to the README to not how to cleanup, it would be great!

kylos101 added 3 commits July 17, 2024 13:12
This way, we stop using NICs, before attempting to delete security groups, after cleaning up when diagnosing.
@kylos101
Copy link
Contributor Author

Clean works as advertised:

gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . diagnose
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] ℹ️  Checking prerequisites                   
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ec2messages is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssm is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.execute-api is configured 
INFO[0001] ℹ️  Found duplicate subnets. We'll test each subnet '[ subnet-0ac7749ca3d2337b2 subnet-0c2be6925d464ae0e]' only once. 
INFO[0001] ℹ️  Launching EC2 instances in Main subnets  
INFO[0001] ℹ️  Created security group with ID: sg-07373362953212e54 
INFO[0009] ℹ️  Created security group with ID: sg-0a6119dcb6a564fc1 
INFO[0011] ℹ️  Launching EC2 instances in a Pod subnets 
WARN[0011] An EC2 instance was already created for subnet 'subnet-0c2be6925d464ae0e', skipping 
WARN[0011] An EC2 instance was already created for subnet ' subnet-0ac7749ca3d2337b2', skipping 
INFO[0011] ℹ️  Waiting for EC2 instances to become ready (can take up to 2 minutes) 
INFO[0040] ✅ EC2 Instances are now running successfully 
INFO[0040] ℹ️  Connecting to SSM...                     
^Csignal: interrupt

gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . clean
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] ✅ Instances terminated                       
INFO[0000] Cleaning up: Waiting for 2 minutes so network interfaces are deleted 
INFO[0121] ✅ Role 'GitpodNetworkCheck' deleted          
INFO[0121] ✅ Instance profile deleted                   
INFO[0122] ✅ Security group 'sg-0a6119dcb6a564fc1' deleted 
INFO[0122] ✅ Security group 'sg-07373362953212e54' deleted 

@kylos101
Copy link
Contributor Author

kylos101 commented Jul 17, 2024

Diagnose does cleanup properly (again) at the end (needed to move the wait), so clean isn't needed unless there's been a failuire:

gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . diagnose
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] ℹ️  Checking prerequisites                   
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ec2messages is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssm is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is configured 
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.execute-api is configured 
INFO[0000] ✅ IAM role created and policy attached       
INFO[0001] ℹ️  Found duplicate subnets. We'll test each subnet '[ subnet-0ac7749ca3d2337b2 subnet-0c2be6925d464ae0e]' only once. 
INFO[0001] ℹ️  Launching EC2 instances in Main subnets  
INFO[0001] ℹ️  Created security group with ID: sg-0be29e7ba0502f569 
INFO[0009] ℹ️  Created security group with ID: sg-05056dde52dd682d7 
INFO[0010] ℹ️  Launching EC2 instances in a Pod subnets 
WARN[0010] An EC2 instance was already created for subnet 'subnet-0c2be6925d464ae0e', skipping 
WARN[0010] An EC2 instance was already created for subnet ' subnet-0ac7749ca3d2337b2', skipping 
INFO[0010] ℹ️  Waiting for EC2 instances to become ready (can take up to 2 minutes) 
INFO[0033] ✅ EC2 Instances are now running successfully 
INFO[0033] ℹ️  Connecting to SSM...                     
INFO[0111] ℹ️  Checking if the required AWS Services can be reached from the ec2 instances 
INFO[0112] ✅ Autoscaling is available                   
INFO[0112] ✅ CloudFormation is available                
INFO[0113] ✅ CloudWatch is available                    
INFO[0114] ✅ EC2 is available                           
INFO[0115] ✅ EC2messages is available                   
INFO[0115] ✅ ECR is available                           
INFO[0116] ✅ ECR Api is available                       
INFO[0117] ✅ EKS is available                           
INFO[0118] ✅ Elastic LoadBalancing is available         
INFO[0118] ✅ KMS is available                           
INFO[0119] ✅ Kinesis Firehose is available              
INFO[0120] ✅ SSM is available                           
INFO[0121] ✅ SSMmessages is available                   
INFO[0122] ✅ SecretsManager is available                
INFO[0123] ✅ Sts is available                           
INFO[0124] ✅ DynamoDB is available                      
INFO[0125] ✅ S3 is available                            
INFO[0125] ✅ Instances terminated                       
INFO[0125] Cleaning up: Waiting for 2 minutes so network interfaces are deleted 
INFO[0246] ✅ Role 'GitpodNetworkCheck' deleted          
INFO[0246] ✅ Instance profile deleted                   
INFO[0246] ✅ Security group 'sg-0be29e7ba0502f569' deleted 
INFO[0247] ✅ Security group 'sg-05056dde52dd682d7' deleted 

gitpod /workspace/enterprise-deployment-toolkit/gitpod-network-check (kylos101/tolerate-duplicate-subnets) $ go run . clean
INFO[0000] ✅ Main Subnets are valid                     
INFO[0000] ✅ Pod Subnets are valid                      
INFO[0000] No instances found.                          
INFO[0000] No roles found.                              
INFO[0000] No security groups found. 

@kylos101
Copy link
Contributor Author

All set, @nandajavarma !

@kylos101 kylos101 merged commit 9106ae5 into main Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants