-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
bugSomething isn't workingSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainersRequires review from the maintainers
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.9.3
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Create a nodepool with ARM64 architecture
2. Try to put your runners on that nodepool with min_runners >0
3. See that runner can't get created due to `Warning FailedBinding 2s ephemeral_volume ephemeral volume work: PVC github-arc/gm.small.arm-lmjwg-runner-7hkh6-work was not created for pod github-arc/gm.small.arm-lmjwg-runner-7hkh6 (pod is not owner)`
Describe the bug
We have couple of self hosted runners configured on AWS using on-prem/spot mix. So far all were on x86_64 architecture, now wanted to test also ARM64 (using Graviton instances). Listener pod starts without problem, but runners can't start due to errors in mounting ephemeral volume.
On side note - exactly same config is used to create those runners and other that work. Only difference is used nodepool. Nodepool itself is also same, except architecture.
Describe the expected behavior
Pods starts normally and handle jobs
Additional Context
krzyzakp@X1Carbon:/home/krzyzakp $ k describe -n github-arc autoscalingrunnersets.actions.github.com gm.small.arm
Name: gm.small.arm
Namespace: github-arc
Labels: actions.github.com/organization=XXXX
actions.github.com/scale-set-name=gm.small.arm
actions.github.com/scale-set-namespace=github-arc
app.kubernetes.io/component=autoscaling-runner-set
app.kubernetes.io/instance=gm.small.arm
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=gm.small.arm
app.kubernetes.io/part-of=gha-rs
app.kubernetes.io/version=0.9.3
helm.sh/chart=gha-rs-0.9.3
Annotations: actions.github.com/cleanup-github-secret-name: gm.small.arm-gha-rs-github-secret
actions.github.com/cleanup-kubernetes-mode-role-binding-name: gm.small.arm-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-role-name: gm.small.arm-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-service-account-name: gm.small.arm-gha-rs-kube-mode
actions.github.com/cleanup-manager-role-binding: gm.small.arm-gha-rs-manager
actions.github.com/cleanup-manager-role-name: gm.small.arm-gha-rs-manager
actions.github.com/runner-group-name: Default
actions.github.com/runner-scale-set-name: gm.small.arm
actions.github.com/values-hash: 485413e5bcb9f4c34b35c4cc53edb1a2443d7055f548c766c0829726fd52282
meta.helm.sh/release-name: gm.small.arm
meta.helm.sh/release-namespace: github-arc
runner-scale-set-id: 20
API Version: actions.github.com/v1alpha1
Kind: AutoscalingRunnerSet
Metadata:
Creation Timestamp: 2025-02-13T15:42:54Z
Finalizers:
autoscalingrunnerset.actions.github.com/finalizer
Generation: 1
Resource Version: 468838671
UID: 763b02d6-c216-41c5-b245-ae83e4180ec7
Spec:
Github Config Secret: gm.small.arm-gha-rs-github-secret
Github Config URL: https://github.com/XXXXX
Listener Template:
Metadata:
Annotations:
prometheus.io/path: /metrics
prometheus.io/port: 8080
prometheus.io/scrape: true
Spec:
Containers:
Image: XXXXX.dkr.ecr.eu-central-1.amazonaws.com/github/actions/gha-runner-scale-set-controller:0.9.3
Name: listener
Resources:
Limits:
Memory: 64Mi
Requests:
Cpu: 100m
Memory: 64Mi
Security Context:
Run As User: 1000
Node Selector:
karpenter.sh/nodepool: runner-arm
Tolerations:
Effect: NoSchedule
Key: karpenter.sh/nodepool
Operator: Equal
Value: runner-arm
Min Runners: 1
Template:
Metadata:
Annotations:
karpenter.sh/do-not-disrupt: true
prometheus.io/path: /metrics
prometheus.io/port: 8080
prometheus.io/scrape: true
Spec:
Containers:
Command:
/home/runner/run.sh
Env:
Name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
Value: false
Name: ACTIONS_RUNNER_CONTAINER_HOOKS
Value: /home/runner/k8s/index.js
Name: ACTIONS_RUNNER_POD_NAME
Value From:
Field Ref:
Field Path: metadata.name
Image: XXXX.dkr.ecr.eu-central-1.amazonaws.com/github-arc-runner:latest
Name: runner
Resources:
Limits:
Cpu: 500m
Memory: 1Gi
Requests:
Cpu: 500m
Memory: 1Gi
Volume Mounts:
Mount Path: /home/runner/_work
Name: work
Node Selector:
karpenter.sh/nodepool: runner-arm
Restart Policy: Never
Security Context:
Fs Group: 1001
Service Account: gm.small
Service Account Name: gm.small.arm-gha-rs-kube-mode
Tolerations:
Effect: NoSchedule
Key: karpenter.sh/nodepool
Operator: Equal
Value: runner-arm
Volumes:
Ephemeral:
Volume Claim Template:
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 10Gi
Storage Class Name: github-arc
Name: work
Status:
Current Runners: 1
Pending Ephemeral Runners: 1
Events: <none>
Controller Logs
https://gist.github.com/krzyzakp/44b0c49aaf49b618d6053cd81286cb03
Runner Pod Logs
Events during startup, giving hope that it will work.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 32s default-scheduler 0/8 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "gm.small.arm-lmjwg-runner-7hkh6-work". preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
Normal Scheduled 28s default-scheduler Successfully assigned github-arc/gm.small.arm-lmjwg-runner-7hkh6 to ip-10-150-112-203.eu-central-1.compute.internal
Normal Nominated 32s karpenter Pod should schedule on: nodeclaim/runner-arm-xlgz6, node/ip-10-150-112-203.eu-central-1.compute.internal
Normal SuccessfulAttachVolume 26s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6232a62b-6f7f-4710-9b53-e3bdb43d5f22"
Normal Pulling 23s kubelet Pulling image "668273420038.dkr.ecr.eu-central-1.amazonaws.com/github-arc-runner:latest"
Normal Pulled 2s kubelet Successfully pulled image "668273420038.dkr.ecr.eu-central-1.amazonaws.com/github-arc-runner:latest" in 20.443s (20.443s including waiting). Image size: 637597729 bytes.
Normal Created 2s kubelet Created container runner
Normal Started 2s kubelet Started container runner
After some time it fails with following Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2s default-scheduler 0/8 nodes are available: persistentvolumeclaim "gm.small.arm-lmjwg-runner-7hkh6-work" is being deleted. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
Warning FailedBinding 2s ephemeral_volume ephemeral volume work: PVC github-arc/gm.small.arm-lmjwg-runner-7hkh6-work was not created for pod github-arc/gm.small.arm-lmjwg-runner-7hkh6 (pod is not owner)
Normal Nominated 1s karpenter Pod should schedule on: nodeclaim/runner-arm-xlgz6, node/ip-10-150-112-203.eu-central-1.compute.internal
ParthSindhu, rob-howie-depop and alexraileanu
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainersRequires review from the maintainers