Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ghalistener high cardinality metrics #3670

Closed
4 tasks done
christophermichaeljohnston opened this issue Jul 19, 2024 · 6 comments · Fixed by #3671
Closed
4 tasks done

ghalistener high cardinality metrics #3670

christophermichaeljohnston opened this issue Jul 19, 2024 · 6 comments · Fixed by #3671
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@christophermichaeljohnston
Copy link
Contributor

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

All actions scheduled by ghalistener use a new runner causing a new metric for every single action. This is because the metrics include runner_id and runner_name which is distinct for every run. For example:

gha_completed_jobs_total{<snip>,runner_id="71363",runner_name="self-hosted-linux-x64-zfhfn-runner-k752n"} 1
gha_completed_jobs_total{<snip>,runner_id="71369",runner_name="self-hosted-linux-x64-zfhfn-runner-pr56c"} 1
gha_completed_jobs_total{<snip>,runner_id="71376",runner_name="self-hosted-linux-x64-zfhfn-runner-qns9x"} 1

The <snip> labels above are identical for the same workflow, but there is a new metric for each action due to runner_id and runner_name being unique.

This also causes memory and cpu usage to continually creep as the listener must keep track of all these metrics, even though it will never update them, due to the unique labels.

Describe the bug

^ see above

This was fixed in githubrunnerscalesetlistener in #3003 and the fix needs to be included in ghalistener.

Describe the expected behavior

Metrics should not include labels are unique as this causes high cardinality and renders the counters, which will only have a value of 1, as unusuable.

Additional Context

n/a

Controller Logs

n/a

Runner Pod Logs

n/a
@christophermichaeljohnston christophermichaeljohnston added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jul 19, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@christophermichaeljohnston
Copy link
Contributor Author

Screenshot 2024-07-19 at 9 14 02 AM

^^ gha listener memory and cpu usage increase caused by tracking of high cardinality metrics

@iwaffles
Copy link

Related to #3153

@christophermichaeljohnston
Copy link
Contributor Author

Included in the attached PR for removal is job_workflow_ref which also causes horrible high cardinality

@ahatzz11
Copy link

ahatzz11 commented Jan 9, 2025

We also see this, and it's quite frustrating having our metrics randomly die until we kick our listener:

CleanShot-001148 2025-01-09 at 12 49 21@2x

@rekha-prakash-maersk
Copy link

Also having issues where the listener had to be force restarted by the k8s due to high resource utilisation because of the metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants