Open
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.10.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Start workflows
2. The first two jobs will get a runner very quickly
3. The third one will sometimes stay pending for 30/40 minutes before getting a runner
Describe the bug
Let's say that I have a workflow with 3 jobs running in parallel.
Sometimes, the jobs 1 and 2 will get a runner right away but the third one will have to wait 30 minutes to an hour before getting a runner.
Describe the expected behavior
All the jobs should start right away.
Note that I have two runner-scale-sets with the same runnerScaleSetName
name, I don't know if its a bad practice or not but its working fine 🤷♂
I did that to ease teh upgrade process when a new chart is available, I update the gha-runner-scale-sets one by one to avoid service interruptions.
Thanks
Additional Context
gha-runner-scale-set-controller:
enabled: true
flags:
logLevel: "warn"
podLabels:
finops.company.net/cloud_provider: gcp
finops.company.net/cost_center: compute
finops.company.net/product: tools
finops.company.net/service: actions-runner-controller
finops.company.net/region: europe-west1
replicaCount: 3
podAnnotations:
ad.datadoghq.com/manager.checks: |
{
"openmetrics": {
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080/metrics",
"histogram_buckets_as_distributions": true,
"namespace": "actions-runner-system",
"metrics": [".*"]
}
]
}
}
metrics:
controllerManagerAddr: ":8080"
listenerAddr: ":8080"
listenerEndpoint: "/metrics"
gha-runner-scale-set:
enabled: true
githubConfigUrl: https://github.com/company
githubConfigSecret:
github_token: <path:secret/github_token/actions_runner_controller#token>
maxRunners: 100
minRunners: 1
containerMode:
type: "dind" ## type can be set to dind or kubernetes
listenerTemplate:
metadata:
labels:
finops.company.net/cloud_provider: gcp
finops.company.net/cost_center: compute
finops.company.net/product: tools
finops.company.net/service: actions-runner-controller
finops.company.net/region: europe-west1
annotations:
ad.datadoghq.com/listener.checks: |
{
"openmetrics": {
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080/metrics",
"histogram_buckets_as_distributions": true,
"namespace": "actions-runner-system",
"max_returned_metrics": 6000,
"metrics": [".*"],
"exclude_metrics": [
"gha_job_startup_duration_seconds",
"gha_job_execution_duration_seconds"
],
"exclude_labels": [
"enterprise",
"event_name",
"job_name",
"job_result",
"job_workflow_ref",
"organization",
"repository",
"runner_name"
]
}
]
}
}
spec:
containers:
- name: listener
securityContext:
runAsUser: 1000
template:
metadata:
labels:
finops.company.net/cloud_provider: gcp
finops.company.net/cost_center: compute
finops.company.net/product: tools
finops.company.net/service: actions-runner-controller
finops.company.net/region: europe-west1
spec:
restartPolicy: OnFailure
imagePullSecrets:
- name: company-prod-registry
containers:
- name: runner
image: eu.gcr.io/company-production/devex/gha-runners:v1.0.0-snapshot5
command: ["/home/runner/run.sh"]
controllerServiceAccount:
namespace: actions-runner-system
name: actions-runner-controller-gha-rs-controller
Controller Logs
https://gist.github.com/julien-michaud/dce55b9320fb236b622cbb00919277ce
Runner Pod Logs
/