Skip to content

Add process minimum age to filter out short lived processes if needed #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

corneliu-calciu
Copy link

Added minimum process age to be considered for discovery.

The new process detection logic will ignore a new process until its age reach a minimum value. As an example we may set a 3 seconds threshold as part of the monitoring configuration.
Any process will be considered as suitable for monitoring once is passing that configurable limit.
The noise of short lived processes can be filtered out.

Configuration example:
discovery:
exclude_services:

  • k8s_namespace: (istio-system|kube-node-lease)
    services:
  • k8s_namespace: (acmefitness)
    min_process_age: 5s

Moved the RP from grafana/beyla#2066

@mariomac
Copy link
Contributor

Thanks for your contribution @corneliu-calciu !

I have one question: what happens if a long-lived process starts while Beyla is running? Beyla will get the notice of that new process but it will detect it as too young, so will it discard it?

@corneliu-calciu
Copy link
Author

Hi Mario

This is exactly the scenario I want to address. The Beyla PODs are already running. A new process will be created on a node.
Now we can specify what is the minimum life span needed to accept and instrument the process.
If we have a large number of processes created and destroyed within a 1-2 seconds interval are just noise, especially on a Kubernetes environment where we expect long lived micro-services.
This capability can mitigate another problem. Please check bug grafana/beyla#2046.
When a new replica is created for a Kubernetes deployments sometimes the process filtering logic fails to recognize the inner processes of the container as are in the initialization phase where some new processes are created and some initialization processes will exit soon.

Best Regards
Corneliu

Copy link
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good already, I left a few comments that I think we should address. Thanks for working on this!!!

@corneliu-calciu corneliu-calciu requested a review from grcevski June 20, 2025 13:14
Copy link

codecov bot commented Jun 20, 2025

Codecov Report

Attention: Patch coverage is 77.77778% with 6 lines in your changes missing coverage. Please review.

Project coverage is 63.37%. Comparing base (445d756) to head (8f6f86b).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
pkg/components/discover/watcher_proc.go 77.77% 3 Missing and 3 partials ⚠️

❗ There is a different number of reports uploaded between BASE (445d756) and HEAD (8f6f86b). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (445d756) HEAD (8f6f86b)
integration-test 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #194      +/-   ##
==========================================
- Coverage   70.45%   63.37%   -7.08%     
==========================================
  Files         178      177       -1     
  Lines       19172    19172              
==========================================
- Hits        13507    12151    -1356     
- Misses       4909     6338    +1429     
+ Partials      756      683      -73     
Flag Coverage Δ
integration-test ?
integration-test-arm 35.89% <66.66%> (-0.06%) ⬇️
oats-test 36.43% <55.55%> (+0.23%) ⬆️
unittests 46.15% <77.77%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@grcevski
Copy link
Contributor

I looked at the test failures, I think we need to extend our default timeout interval since it takes a bit longer now for the services to be instrumented.

These functions:

func waitForTestComponentsSub(t *testing.T, url, subpath string) {
	waitForTestComponentsSubWithTime(t, url, subpath, 1)
}

func waitForTestComponentsSubStatus(t *testing.T, url, subpath string, status int) {
	waitForTestComponentsSubWithTimeAndCode(t, url, subpath, status, 1)
}

in test_utils.go (integration tests). Right now we wait a minute, I think we should make that a constant and change it to 2 minutes.

@grcevski
Copy link
Contributor

Hm, it looks like the problem is somewhere else, I'm going to take a look myself, but I can't before Tuesday next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants