K8s mode without PV #160

DenisPalnitsky · 2024-04-26T07:29:05Z

The goal of this PR is to remove the dependency on PV from Kubernetes mode.

The approach

Currently PV is used to share the files between Runner and Workflow pods. Those files include scripts that should be executed on runners and required software such as Node.

My assumption is that it's a one way communication from Runner to Workflow, meaning that it's a Runner pod that "sends" file to the "Workflow" and Workflow pod does not need to share any files with Runner.

flowchart LR
    A[Runner Pod] -->|Sends files to| B[Workflow Pod]
    A -. Pull logs from.-> B

Based on that assumption we can copy the required files directly to Workflow pod without provisioning common storage. We can use cpToPod function for direct file copying.

This will address the issue described here.

Feedback Request

This PR is a POC that was tested on a workflows that run in a container and it worked fine. I wonder, if there is any flaws that will make this approach unviable.

Implementation Notes

I had to rework cpToPod function to wait for the operation to finish.

anlesk · 2024-08-12T18:38:41Z

Is there any work happening on this one? @DenisPalnitsky @nikola-jokic
This got mentioned in couple of issues as a potential way of resolving those, but it is in draft without any information about the future of that.

DenisPalnitsky · 2024-08-17T09:00:54Z

@anlesk I tested it on my case and it worked. Let me rebase this PR, give it another test and mark it as ready for review if all good.

DenisPalnitsky · 2024-08-18T16:58:54Z

I ran some tests and overall the approach works however when I try to run quite heavy workflow with 60 jobs and 6 containers in each job I get below error ~50% of cases.

##[debug]Archiving _work to /tmp/3b04107c-1297-42dd-9b15-e50a7db6af96
##[debug]Transferring: 486,784,512 Bytes
##[debug]Exec cpToPod
(node:72) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Error: Process completed with exit code 137.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   --- End of inner exception stack trace ---
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.PrepareJobAsync(IExecutionContext context, List`1 containers)
##[debug]   at GitHub.Runner.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data)
##[debug]   at GitHub.Runner.Worker.JobExtensionRunner.RunAsync()
##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Initialize containers

The memory consumption is ~1Gb for both successful and failed runners and it's less then the pod limit.

I could not figure out what is the root-cause however I'm not a JS expert so maybe I'm missing something obvious. I would appreciate the review and a help from the community.

aavileli · 2025-04-07T02:06:47Z

Would any GitHub Action runner maintainers be able to share an estimated timeline for addressing this issue?

zarko-a · 2025-04-08T21:32:23Z

We are also interested in getting away from a shared volume between the runner and workflow pods. Would be great if this PR got reviewed and merged!

@nikola-jokic any chance this can move forward?

nikola-jokic · 2025-04-10T10:48:42Z

Hey everyone, this is definitely one of the things we will prioritize soon. This approach is interesting, but we want to try few more approaches before we land on the final solution. I'll keep you all updated

aavileli · 2025-04-10T23:20:37Z

@nikola-jokic Thanks for the update — appreciate you keeping us informed. Just wanted to mention that this is currently impacting our evaluation of GitHub Actions, so we're looking forward to seeing how the final direction shapes up.

quoctruong · 2025-04-24T22:26:17Z

@nikola-jokic Is there any update on this? Is this approach valid? I was trying to do something similar but wasn't sure if the volume needs to be kept in sync or just the initial copy of the volume during the prepare step is enough?

nikola-jokic · 2025-04-25T12:02:56Z

Hey @quoctruong,

We will work on this soon, but there are a few things we need to worry about:

The filesystem and assets should be provided to the hook. I would like to experiment with adding a sidecar container that would provide the filesystem to the workflow container
The files generated inside the image should be reflected back to the runner. Now I'm not 100% sure this must be done, but I need to test and ensure that writing env to $GITHUB_ENV works properly.

Using a sidecar would eliminate the copy, which would be great. But we need to test this approach and make sure we don't break anything.

zarko-a · 2025-04-25T15:24:33Z

Thanks for the update @nikola-jokic . We are extremely interested in getting away from shared volumes. We've tried multiple RWX storage options and none perform well with large repos.

quoctruong · 2025-04-30T01:02:02Z

Thanks @nikola-jokic. Out of curiosity, how do you plan to solve the issue where the runner container is started up before even knowing about the workflow container? Since it's not possible to add a container to a running pod, this means even with a sidecar, all the containers have to be in the pod spec before they are started? So most likely this has to be done in the https://github.com/actions/runner code base and not in the hook?

zarko-a · 2025-07-01T22:13:14Z

@nikola-jokic any updates on the progress around shared volume deprecation?

Any chance this is gonna be planned for next release, 0.13?

austinpray-mixpanel · 2025-07-01T22:49:48Z

If I'm reading this approach correctly it seems like we are tarring up the working directory from the runner pod and piping it over the kube exec api to the runner pod? I have significant doubts that this will be a stable approach. At scale we observe even trivial use cases for the exec api (like exec into a pod and check for existence of a file on a cron) to fail for all sorts of reasons.

Assuming a copy is the right thing to do, If I had to do this copy in a stable way I would do something native to my cloud environment like https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-cloning. In which case I would expect there to be some kind of interface for me to implement this approach specific to GKE clusters that might not be relevant to the majority of folks.

MarcoDalco mentioned this pull request Jun 12, 2024

Workflow containers to run for runners in K8s mode take minutes to start up #167

Open

nikola-jokic mentioned this pull request Jun 25, 2024

ARC should handle OOM killed runners #143

Open

4 tasks

DenisPalnitsky force-pushed the feat.k8s-mode-witout-pv branch from 6426853 to 1a2ba2f Compare August 17, 2024 12:16

DenisPalnitsky and others added 9 commits August 17, 2024 14:19

Use service name as a container name

05d5fc3

local copyToPod

a717bba

Working prototype with file sync

6d8b8c4

Don't use nginx

bfae0c9

Clean up a bit

49c8096

Remove dist

9dbdd0a

Use kube scheduler every time

b73e5c9

remove use pod schedule function

5072fc0

chore: Update logging level for copy completion time

f6d616b

DenisPalnitsky force-pushed the feat.k8s-mode-witout-pv branch from 1a2ba2f to f6d616b Compare August 17, 2024 12:21

Remove debug logging

87f730a

DenisPalnitsky changed the title ~~K8s mode without PV [DRAFT]~~ K8s mode without PV Aug 18, 2024

DenisPalnitsky marked this pull request as ready for review August 18, 2024 16:59

DenisPalnitsky requested review from a team as code owners August 18, 2024 16:59

zarko-a mentioned this pull request Apr 15, 2025

checkout v4 fails with ACTIONS_RUNNER_CONTAINER_HOOKS #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

K8s mode without PV #160

K8s mode without PV #160

Uh oh!

DenisPalnitsky commented Apr 26, 2024

Uh oh!

anlesk commented Aug 12, 2024

Uh oh!

DenisPalnitsky commented Aug 17, 2024

Uh oh!

DenisPalnitsky commented Aug 18, 2024 •

edited

Loading

Uh oh!

aavileli commented Apr 7, 2025

Uh oh!

zarko-a commented Apr 8, 2025

Uh oh!

nikola-jokic commented Apr 10, 2025

Uh oh!

aavileli commented Apr 10, 2025

Uh oh!

quoctruong commented Apr 24, 2025

Uh oh!

nikola-jokic commented Apr 25, 2025

Uh oh!

zarko-a commented Apr 25, 2025

Uh oh!

quoctruong commented Apr 30, 2025

Uh oh!

zarko-a commented Jul 1, 2025

Uh oh!

austinpray-mixpanel commented Jul 1, 2025

Uh oh!

Uh oh!

K8s mode without PV #160

Are you sure you want to change the base?

K8s mode without PV #160

Uh oh!

Conversation

DenisPalnitsky commented Apr 26, 2024

The approach

Feedback Request

Implementation Notes

Uh oh!

anlesk commented Aug 12, 2024

Uh oh!

DenisPalnitsky commented Aug 17, 2024

Uh oh!

DenisPalnitsky commented Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aavileli commented Apr 7, 2025

Uh oh!

zarko-a commented Apr 8, 2025

Uh oh!

nikola-jokic commented Apr 10, 2025

Uh oh!

aavileli commented Apr 10, 2025

Uh oh!

quoctruong commented Apr 24, 2025

Uh oh!

nikola-jokic commented Apr 25, 2025

Uh oh!

zarko-a commented Apr 25, 2025

Uh oh!

quoctruong commented Apr 30, 2025

Uh oh!

zarko-a commented Jul 1, 2025

Uh oh!

austinpray-mixpanel commented Jul 1, 2025

Uh oh!

Uh oh!

DenisPalnitsky commented Aug 18, 2024 •

edited

Loading