Skip to content

Load the existing pods when initializing kubernetes client to cleanup terminated app pods #7101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

turboFei
Copy link
Member

@turboFei turboFei commented Jun 17, 2025

Why are the changes needed?

To prevent the terminated app pods leak if the events missed during kyuubi server restart.

How was this patch tested?

Manual test.

:2025-06-17 17:50:37.275 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423211008-grectg-stm-17da59fe-caf4-41e4-a12f-6c1ed9a293f9-driver with label: kyuubi-unique-tag=17da59fe-caf4-41e4-a12f-6c1ed9a293f9 in app state FINISHED, marking it as terminated
2025-06-17 17:50:37.278 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423212011-gpdtsi-stm-6a23000f-10be-4a42-ae62-4fa2da8fac07-driver with label: kyuubi-unique-tag=6a23000f-10be-4a42-ae62-4fa2da8fac07 in app state FINISHED, marking it as terminated

The pods are cleaned up eventually.
image

Was this patch authored or co-authored using generative AI tooling?

No.

@turboFei turboFei changed the title cleanup Load the existing pods when initializing kubernetes client to cleanup terminated app pods Jun 17, 2025
@turboFei turboFei marked this pull request as draft June 17, 2025 01:33
@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 0% with 30 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (7fbeea6) to head (7f76cf5).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...kyuubi/engine/KubernetesApplicationOperation.scala 0.00% 30 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7101   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         697     697           
  Lines       43214   43243   +29     
  Branches     5855    5859    +4     
======================================
- Misses      43214   43243   +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@turboFei turboFei marked this pull request as ready for review June 18, 2025 00:58
@turboFei turboFei self-assigned this Jun 18, 2025
@turboFei
Copy link
Member Author

cc @pan3793

@turboFei turboFei requested a review from pan3793 June 18, 2025 01:00
@turboFei
Copy link
Member Author

How about now? @pan3793

Due now we always cleanup terminated pods, so I think it is not necessary to involve a new config to cleanup them on kubernetes client init.

@pan3793
Copy link
Member

pan3793 commented Jun 20, 2025

my major concern is that something might go wrong during cleanTerminatedAppPodsOnKubernetesClientInitialize, but now it runs in async, failure won't block the original procedure, config is not required then.

@turboFei turboFei closed this in 302b5fa Jun 23, 2025
turboFei added a commit that referenced this pull request Jun 23, 2025
…ient to cleanup terminated app pods

### Why are the changes needed?

To prevent the terminated app pods leak if the events missed during kyuubi server restart.

### How was this patch tested?

Manual test.

```
:2025-06-17 17:50:37.275 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423211008-grectg-stm-17da59fe-caf4-41e4-a12f-6c1ed9a293f9-driver with label: kyuubi-unique-tag=17da59fe-caf4-41e4-a12f-6c1ed9a293f9 in app state FINISHED, marking it as terminated
2025-06-17 17:50:37.278 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423212011-gpdtsi-stm-6a23000f-10be-4a42-ae62-4fa2da8fac07-driver with label: kyuubi-unique-tag=6a23000f-10be-4a42-ae62-4fa2da8fac07 in app state FINISHED, marking it as terminated
```
The pods are cleaned up eventually.
<img width="664" alt="image" src="https://github.com/user-attachments/assets/8cf58f61-065f-4fb0-9718-2e3c00e8d2e0" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7101 from turboFei/pod_cleanup.

Closes #7101

7f76cf5 [Wang, Fei] async
11c9db2 [Wang, Fei] cleanup

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
(cherry picked from commit 302b5fa)
Signed-off-by: Wang, Fei <fwang12@ebay.com>
@turboFei
Copy link
Member Author

thanks, merged to main and branch-1.10

@turboFei turboFei deleted the pod_cleanup branch June 23, 2025 05:35
@turboFei turboFei added this to the v1.10.3 milestone Jun 23, 2025
@pan3793
Copy link
Member

pan3793 commented Jun 26, 2025

@turboFei branch-1.10 broken after merging this PR

Error: ] /home/runner/work/kyuubi/kyuubi/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala:110: too many arguments (3) for method markApplicationTerminated: (pod: io.fabric8.kubernetes.api.model.Pod, eventType: org.apache.kyuubi.engine.KubernetesResourceEventTypes.KubernetesResourceEventType)Unit

turboFei added a commit to turboFei/kyuubi that referenced this pull request Jun 26, 2025
pan3793 pushed a commit that referenced this pull request Jul 2, 2025
### Why are the changes needed?

Address comment: #7101 (comment)

### How was this patch tested?

GA.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7116 from turboFei/fix_conflicts.

Closes #7101

f25f487 [Wang, Fei] [KYUUBI #7101][1.10][FOLLOWUP] Fix code broken

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants