Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory continously increasing after 6 workspaces #180

Open
balu-ce opened this issue Jul 17, 2023 · 7 comments
Open

Memory continously increasing after 6 workspaces #180

balu-ce opened this issue Jul 17, 2023 · 7 comments
Labels
bug Something isn't working community needs:triage

Comments

@balu-ce
Copy link

balu-ce commented Jul 17, 2023

What happened?

We have noticed a constant increase of pod memory of terraform provider.
The pod memory has touched nearly 1.38 GiB

How can we reproduce it?

We are running a terraform provider to create EKS cluster across accounts. So currently we have 12 workspaces

Screenshot 2023-07-17 at 4 01 34 PM

What environment did it happen in?

  • Crossplane Version: 10.2
  • Provider Version: v0.7.0
  • Kubernetes Version: 1.25
  • Kubernetes Distribution: EKS
@balu-ce balu-ce added bug Something isn't working needs:triage labels Jul 17, 2023
@ytsarev
Copy link
Member

ytsarev commented Jul 17, 2023

Thanks for your report! What kind of terraform providers are involved in the workspace operation? is it terraform-provider-aws only? could you please share the version?

@ytsarev
Copy link
Member

ytsarev commented Jul 17, 2023

High-level, without actually reproducing it, hashicorp/terraform-provider-aws#31722 looks related.

@bobh66
Copy link
Collaborator

bobh66 commented Jul 18, 2023

We have seen this problem with terraform-provider-aws >= 4.67.0 - we had to pin the provider to < 4.67.0 in order to prevent the memory issues.

@balu-ce
Copy link
Author

balu-ce commented Aug 4, 2023

@bobh66 @ytsarev since the terraform provider is an kube operator and only one pod runs the workspace through leader election. can we run the workspace horizontally in a pods by any chance ?

@bobh66
Copy link
Collaborator

bobh66 commented Aug 4, 2023

Unfortunately no - kubernetes controllers can only work as single processes, there is no way to share a single kubernetes resource type across multiple controller instances. Both instances would be receiving and processing events and there is no way to ensure that a single resource is only handled by a single controller.

Have you identified what is using the memory? The provider cache will usually solve most memory usage issues, except for the known problem with the latest AWS provider using excessive amounts of memory.

@balu-ce
Copy link
Author

balu-ce commented Aug 14, 2023

@bobh66 @ytsarev I can able see the go program which is running terraform commands itself taking 1.29 Gib of memory although I make the workspaces annotation reconcilation value paused to true

@bobh66
Copy link
Collaborator

bobh66 commented Aug 16, 2023

I would exec into the pod and run "du -s /tf/*" to see what is using all of the memory (which equates to disk usage).

For example:

$ kubectl exec -it -n crossplane-system provider-terraform-official-5f29c294f0da-66c7476ddb-jrn79 -- bash
bash-5.1$ du -s /tf/*
40	/tf/a639f793-e2ca-41cc-993b-628ef529429e
24	/tf/bd05c9b5-0655-4374-8d8b-f987abf15296
24	/tf/c88627d4-48f0-4043-88c0-6ff4774f0f2b
24	/tf/d0428460-eafd-4833-bd08-32a29bc37ab5
24	/tf/d2642baf-c8a7-47d4-8642-a19168aa08ec
24	/tf/e163d21d-3d6f-43ec-8f06-4168b27fcfc0
28	/tf/e4ed18af-729b-488f-b026-b9d22f9e10b6
24	/tf/edfb5678-adb2-40cb-be9e-df724c7a107f
24	/tf/f4c5a091-0729-4586-970f-b3a10e415a4e
24	/tf/ff282ba0-2a33-49a5-826c-1574f5cdc378
2137260	/tf/plugin-cache

shows the bulk of the usage is in the provider cache, which I would expect, and:

bash-5.1$ du -s /tf/plugin-cache/registry.terraform.io/hashicorp/aws/*
326044	/tf/plugin-cache/registry.terraform.io/hashicorp/aws/5.11.0
359176	/tf/plugin-cache/registry.terraform.io/hashicorp/aws/5.3.0
360212	/tf/plugin-cache/registry.terraform.io/hashicorp/aws/5.4.0
354640	/tf/plugin-cache/registry.terraform.io/hashicorp/aws/5.5.0
355756	/tf/plugin-cache/registry.terraform.io/hashicorp/aws/5.6.1
355936	/tf/plugin-cache/registry.terraform.io/hashicorp/aws/5.6.2

shows that there are 6 versions of the AWS provider cached which is using all of the space. If we pinned the AWS provider to a specific version it would use a lot less space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community needs:triage
Projects
None yet
Development

No branches or pull requests

3 participants