Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add timeout handling for subprocess wait #12806

Merged
merged 8 commits into from
Mar 26, 2025

Conversation

quinna-h
Copy link
Contributor

@quinna-h quinna-h commented Mar 19, 2025

After subprocess is opened, wait and kill if Timeout is exceeded to ensure zombie processes are killed.

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

https://datadoghq.atlassian.net/browse/APMLP-313

@quinna-h quinna-h requested a review from a team as a code owner March 19, 2025 20:13
@quinna-h quinna-h requested a review from emmettbutler March 19, 2025 20:13
Copy link
Contributor

github-actions bot commented Mar 19, 2025

CODEOWNERS have been resolved as:

releasenotes/notes/fix-lib-injection-zombie-dc16ca6c8bf50edb.yaml       @DataDog/apm-python
lib-injection/sources/sitecustomize.py                                  @DataDog/apm-core-python

@quinna-h quinna-h changed the title ensure zombie process is killed on telemetry fix: add timeout handling for subprocess wait Mar 19, 2025
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 3.67%. Comparing base (274e6f2) to head (838c532).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12806      +/-   ##
==========================================
- Coverage    3.67%    3.67%   -0.01%     
==========================================
  Files        1378     1378              
  Lines      135899   135899              
==========================================
- Hits         4992     4990       -2     
- Misses     130907   130909       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

github-actions bot commented Mar 19, 2025

Bootstrap import analysis

Comparison of import times between this PR and main.

Summary

The average import time in this PR is: 234 ± 4 ms.

The average import time in main is: 234 ± 3 ms.

The import time difference between this PR and main is: -0.0 ± 0.2 ms.

The difference is not statistically significant (z = -0.22).

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.856 ms (0.79%)
ddtrace.bootstrap.sitecustomize 1.256 ms (0.54%)
ddtrace.bootstrap.preload 1.256 ms (0.54%)
ddtrace.internal.products 1.256 ms (0.54%)
ddtrace.internal.remoteconfig.client 0.640 ms (0.27%)
ddtrace 0.600 ms (0.26%)

@pr-commenter
Copy link

pr-commenter bot commented Mar 19, 2025

Benchmarks

Benchmark execution time: 2025-03-26 14:32:25

Comparing candidate commit af150ac in PR branch quinna.halim/kill-zombie-telemetry-process with baseline commit efa4325 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 498 metrics, 2 unstable metrics.

@brettlangdon brettlangdon requested a review from Copilot March 26, 2025 13:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces timeout handling for subprocess waits to ensure that any spawned subprocesses are terminated when a timeout is exceeded and to avoid zombie processes. Key changes include renaming the internal telemetry sending function to _send_telemetry, adding a try/finally block to properly close subprocess streams, and implementing a timeout check using p.wait(1) for Python versions 3.3 and above.

@brettlangdon brettlangdon requested a review from a team as a code owner March 26, 2025 13:22
@brettlangdon brettlangdon requested a review from taegyunkim March 26, 2025 13:22
@brettlangdon brettlangdon enabled auto-merge (squash) March 26, 2025 13:25
@DataDog DataDog deleted a comment from Copilot bot Mar 26, 2025
@pawelchcki pawelchcki self-requested a review March 26, 2025 13:29
Copy link

@pawelchcki pawelchcki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment about older Pythons Timeout.

Since its a small subset of places where we run - and this should also be handled by some of the injector guardrails.

I'll mark this PR as approved. But it would still be good to fix this timeout.

Copy link

@natitsechanski natitsechanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brettlangdon brettlangdon merged commit b518cfd into main Mar 26, 2025
341 checks passed
@brettlangdon brettlangdon deleted the quinna.halim/kill-zombie-telemetry-process branch March 26, 2025 14:34
github-actions bot pushed a commit that referenced this pull request Mar 26, 2025
After subprocess is opened, wait and kill if Timeout is exceeded to
ensure zombie processes are killed.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [ ] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

https://datadoghq.atlassian.net/browse/APMLP-313

---------

Co-authored-by: brettlangdon <brett.langdon@datadoghq.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit b518cfd)
github-actions bot pushed a commit that referenced this pull request Mar 26, 2025
After subprocess is opened, wait and kill if Timeout is exceeded to
ensure zombie processes are killed.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [ ] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

https://datadoghq.atlassian.net/browse/APMLP-313

---------

Co-authored-by: brettlangdon <brett.langdon@datadoghq.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit b518cfd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants