Skip to content

fix cloud-provider-gcp-tests prow job OOMKilled #34997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

YifeiZhuang
Copy link

@YifeiZhuang YifeiZhuang commented Jun 17, 2025

Attempt to fix presubmit prow job failure
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/cloud-provider-gcp/856/cloud-provider-gcp-tests/1932867324889534464

Terminated (OOMKilled - : Found 121 targets and 27 test targets... [0 / 57] [Prepa] Writing file external/go_sdk/packages.txt ... (4 actions, 0 running) [36 / 1,495] GoToolchainBinaryBuild external/go_sdk/builder; 14s processwrapper-sandbox ... (8 actions running) [37 / 1,496] GoToolchainBinaryBuild external/go_sdk/builder; 23s processwrapper-sandbox ... (8 actions, 7 running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 34s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 56s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 74s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 109s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 133s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 160s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 195s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 235s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 279s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 328s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 384s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 450s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 525s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 611s processwrapper-sandbox ... (8 actions running) [39 / 1,503] GoToolchainBinaryBuild external/go_sdk/builder; 711s processwrapper-sandbox ... (8 actions running) ) at 2025-06-11 18:40:45 +0000 UTC with exit code 137

Reproduced and tested locally:

./pkg/pj-on-kind.sh cloud-provider-gcp-tests
NAME                                   READY   STATUS     RESTARTS   AGE
2b8df3d8-410c-4101-bcee-7481ae396f8a   0/2     Init:0/3   0          0s
2b8df3d8-410c-4101-bcee-7481ae396f8a   0/2     Init:0/3   0          1s
2b8df3d8-410c-4101-bcee-7481ae396f8a   0/2     Init:1/3   0          21s
2b8df3d8-410c-4101-bcee-7481ae396f8a   0/2     Init:2/3   0          22s
2b8df3d8-410c-4101-bcee-7481ae396f8a   0/2     PodInitializing   0          23s
2b8df3d8-410c-4101-bcee-7481ae396f8a   2/2     Running           0          24s
2b8df3d8-410c-4101-bcee-7481ae396f8a   1/2     OOMKilled         0          118s

After fix:

./pkg/pj-on-kind.sh cloud-provider-gcp-tests
NAME                                   READY   STATUS     RESTARTS   AGE
55ca818f-978c-452f-8b0c-7e638f940757   0/2     Init:0/3   0          0s
55ca818f-978c-452f-8b0c-7e638f940757   0/2     Init:0/3   0          1s
55ca818f-978c-452f-8b0c-7e638f940757   0/2     Init:1/3   0          20s
55ca818f-978c-452f-8b0c-7e638f940757   0/2     Init:2/3   0          21s
55ca818f-978c-452f-8b0c-7e638f940757   0/2     PodInitializing   0          22s
55ca818f-978c-452f-8b0c-7e638f940757   2/2     Running           0          23s
55ca818f-978c-452f-8b0c-7e638f940757   1/2     NotReady          0          20m
55ca818f-978c-452f-8b0c-7e638f940757   0/2     Completed         0          20m
55ca818f-978c-452f-8b0c-7e638f940757   0/2     Completed         0          20m

kubernetes/cloud-provider-gcp#856

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/config Issues or PRs related to code in /config size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/jobs labels Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: YifeiZhuang
Once this PR has been reviewed and has the lgtm label, please assign jpbetz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/provider/gcp Issues or PRs related to gcp provider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 17, 2025
@YifeiZhuang
Copy link
Author

/assign @gauravkghildiyal

cc. @aojea

@gauravkghildiyal
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2025
@YifeiZhuang
Copy link
Author

/assign @aojea

Antonio, only you have the permission to approve.

This test is just to test bazel build and UT. I did many local runs. 10G is not enough. Not for 14G. It is only stable at 16G.
And looking at the places it gets stuck, it is not the code base itself. It is constantly stuck/OOM when loading vendor code. The actually build and UT is quick. Maybe we should trim dependencies to make the compiling lightweight. Let's mitigate it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/jobs area/provider/gcp Issues or PRs related to gcp provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants