Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete `LoadBalancer` Services before the cluster #1010

Merged
merged 11 commits into from Jul 19, 2019

Conversation

@2opremio
Copy link
Contributor

commented Jul 9, 2019

Description

type: LoadBalancer Kubernetes services are backed by ELBs in AWS.

If we simply try to remove the cluster templates directly when deleting the cluster, these ELBs are going to dangle (since they are not part of the templates). This in turn causes makes the cluster's VPC deletion to fail.

Fixes #103

Checklist

  • Code compiles correctly (i.e make build)
  • [-] Added tests that cover your change (if possible) It's mostly AWS API calls, which makes it a bit hard to test the logic with unit tests. Happy to add an integration test if needed (once I learn how to do it).
  • All unit tests passing (i.e. make test)
  • Manually tested. For the record, I created 16 type: LoadBalancer services before deleting the cluster (8 backed by classic ELBs ones and 8 backed by v2 ELBS) after I proceeded to delete the cluster, which happened cleanly. I did the following:
$ for I in `seq 8`; do cat memcache-svc.yaml | I=$I envsubst  | kubectl apply -f - ; done
$ for I in `seq 8`; do cat memcache-svc-v2.yaml | I=$I envsubst  | kubectl apply -f - ; done

Where:

memcache-svc.yaml:

---
apiVersion: v1
kind: Service
metadata:
  name: service$I
spec:
  ports:
    - name: memcached
      port: 11211
  type: LoadBalancer
  selector:
    name: memcached

memcache-svc-v2.yaml:

---
apiVersion: v1
kind: Service
metadata:
  name: servicev2$I
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  ports:
    - name: memcached
      port: 11211
  type: LoadBalancer
  selector:
    name: memcached
@2opremio 2opremio requested review from errordeveloper and martina-if Jul 9, 2019
@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

I need bump the build cache. Will do it later

pkg/vpc/cleanup.go Outdated Show resolved Hide resolved
Copy link
Member

left a comment

LGTM overall, thank you!

Just a few nits, I will review in detail later on :)

@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

I have just found out that the aws cloud controller sometimes leaves security groups associated to the load balancers, preventing the VPC from being deleted. I can fix this in a separate PR.

@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

I have just found out that the aws cloud controller sometimes leaves security groups associated to the load balancers, preventing the VPC from being deleted. I can fix this in a separate PR.

I suppose it leaves them in a hope to re-use? Is there a GC that is supposed to trigger eventually?

Separate PR sounds good to me.

@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

So to summaries for an external observer who may not read the code, what we are doing is this:

  • using Kubernetes API: find all services that carry a load balancer and delete them
  • using ELB APIs: wait with a timeout for load balancer to get deleted, if timeout occurs, we produce an error

And we don't attempt forcing deletion ourselves just yet, correct?

It maybe a good idea to document this, opening a follow-up issue would a good start.

@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

Another follow-up issue would be to review what popular ingress controllers bring into the picture, they might also create ELBs. Of course, ingress controllers are not built-in, but Ingress objects are part of Kubernetes. The answer may be that we just leave it until we support certain ingress controller add-ons, but let's open an issue to track it.

pkg/elb/cleanup.go Outdated Show resolved Hide resolved
pkg/elb/cleanup.go Outdated Show resolved Hide resolved
pkg/elb/cleanup.go Outdated Show resolved Hide resolved
if err != nil {
return err
}
elb.CleanupLoadBalancers(ctl.Provider.ELB(), ctl.Provider.ELBV2(), client)

This comment has been minimized.

Copy link
@errordeveloper

errordeveloper Jul 10, 2019

Member

We should prepend it to tasks actually, but you will need some of the changes from #847 for that, I can pick those onto another PR if you like, or we refactor later on.

go.mod Show resolved Hide resolved
go.sum Show resolved Hide resolved
Copy link
Member

left a comment

The deletion strategy LGTM overall. Just a few more relatively minor comment and suggestions.

@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

And we don't attempt forcing deletion ourselves just yet, correct?

Correct

@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

I have just found out that the aws cloud controller sometimes leaves security groups associated to the load balancers, preventing the VPC from being deleted. I can fix this in a separate PR.

I ended up fixing it in this PR. It was a pain in the ass because I ended up running into kubernetes/kubernetes#79994 which I worked around by deleting orphan security groups as a last step.

I did a whole lot of manual testing. I also rebased it on master.

@errordeveloper PTAL

Copy link
Member

left a comment

Looks good overall. Great that someone finally fixes this! Can you add tests?

pkg/elb/cleanup.go Outdated Show resolved Hide resolved
pkg/elb/cleanup.go Outdated Show resolved Hide resolved
pkg/elb/cleanup.go Outdated Show resolved Hide resolved
pkg/elb/cleanup.go Outdated Show resolved Hide resolved
@2opremio 2opremio force-pushed the fons/103-delete-lb-services branch from 1a1dbe5 to 775f2cc Jul 19, 2019
@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

Can you add tests?

That will be covered by #1031 , which I will work on for 0.4.0

2opremio added 11 commits Jul 9, 2019
`type: LoadBalancer` Kubernetes services are backed by ELBs in AWS.

If we simply try to remove the cluster templates directly when deleting
the cluster, these ELBs are going to dangle (since they are not part of
the templates). This in turn causes makes the cluster's VPC deletion
to fail.
Also:

1. print warnings when unexpected conditions occur
2. deleted orphaned security groups due to kubernetes/kubernetes#79994
@2opremio 2opremio force-pushed the fons/103-delete-lb-services branch from beed878 to b7c827a Jul 19, 2019
@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

I'm happy to merge this once we rebase

Done

@errordeveloper errordeveloper merged commit c11654f into master Jul 19, 2019
3 checks passed
3 checks passed
WIP Ready for review
Details
ci/circleci: make-eksctl-image Your tests passed on CircleCI!
Details
netlify/eksctl/deploy-preview Deploy preview ready!
Details
@errordeveloper errordeveloper deleted the fons/103-delete-lb-services branch Jul 19, 2019
@2opremio

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2019

D3nn added a commit that referenced this pull request Aug 15, 2019
A previous PR (#1010) made it a prerequisite that an EKS cluster must
be in an ACTIVE state before `eksctl delete cluster` would be able to
delete the cluster.  This was done to find all the ELBs that were
created due to services in the cluster.  However, if a cluster isn't
created, there can be no ELBs created for the cluster, so this
PR removes that restriction and adds some integration tests for
this use case.
D3nn added a commit that referenced this pull request Aug 15, 2019
A previous PR (#1010) made it a prerequisite that an EKS cluster must
be in an ACTIVE state before `eksctl delete cluster` would be able to
delete the cluster.  This was done to find all the ELBs that were
created due to services in the cluster.  However, if a cluster isn't
created, there can be no ELBs created for the cluster, so this
PR removes that restriction and adds some integration tests for
this use case.
D3nn added a commit that referenced this pull request Aug 20, 2019
A previous PR (#1010) made it a prerequisite that an EKS cluster must
be in an ACTIVE state before `eksctl delete cluster` would be able to
delete the cluster.  This was done to find all the ELBs that were
created due to services in the cluster.  However, if a cluster isn't
created, there can be no ELBs created for the cluster, so this
PR removes that restriction and adds some integration tests for
this use case.
@D3nn D3nn referenced this pull request Aug 21, 2019
1 of 1 task complete
martina-if added a commit to martina-if/eksctl that referenced this pull request Aug 22, 2019
martina-if added a commit to martina-if/eksctl that referenced this pull request Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.