Skip to content

K8s mode without PV #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

DenisPalnitsky
Copy link

The goal of this PR is to remove the dependency on PV from Kubernetes mode.

The approach

Currently PV is used to share the files between Runner and Workflow pods. Those files include scripts that should be executed on runners and required software such as Node.

My assumption is that it's a one way communication from Runner to Workflow, meaning that it's a Runner pod that "sends" file to the "Workflow" and Workflow pod does not need to share any files with Runner.

Loading
flowchart LR
    A[Runner Pod] -->|Sends files to| B[Workflow Pod]
    A -. Pull logs from.-> B

Based on that assumption we can copy the required files directly to Workflow pod without provisioning common storage. We can use cpToPod function for direct file copying.

This will address the issue described here.

Feedback Request

This PR is a POC that was tested on a workflows that run in a container and it worked fine. I wonder, if there is any flaws that will make this approach unviable.

Implementation Notes

I had to rework cpToPod function to wait for the operation to finish.

@anlesk
Copy link

anlesk commented Aug 12, 2024

Is there any work happening on this one? @DenisPalnitsky @nikola-jokic
This got mentioned in couple of issues as a potential way of resolving those, but it is in draft without any information about the future of that.

@DenisPalnitsky
Copy link
Author

@anlesk I tested it on my case and it worked. Let me rebase this PR, give it another test and mark it as ready for review if all good.

@DenisPalnitsky DenisPalnitsky force-pushed the feat.k8s-mode-witout-pv branch from 6426853 to 1a2ba2f Compare August 17, 2024 12:16
@DenisPalnitsky DenisPalnitsky force-pushed the feat.k8s-mode-witout-pv branch from 1a2ba2f to f6d616b Compare August 17, 2024 12:21
@DenisPalnitsky
Copy link
Author

DenisPalnitsky commented Aug 18, 2024

I ran some tests and overall the approach works however when I try to run quite heavy workflow with 60 jobs and 6 containers in each job I get below error ~50% of cases.

##[debug]Archiving _work to /tmp/3b04107c-1297-42dd-9b15-e50a7db6af96
##[debug]Transferring: 486,784,512 Bytes
##[debug]Exec cpToPod
(node:72) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Error: Process completed with exit code 137.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   --- End of inner exception stack trace ---
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.PrepareJobAsync(IExecutionContext context, List`1 containers)
##[debug]   at GitHub.Runner.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data)
##[debug]   at GitHub.Runner.Worker.JobExtensionRunner.RunAsync()
##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Initialize containers

The memory consumption is ~1Gb for both successful and failed runners and it's less then the pod limit.

I could not figure out what is the root-cause however I'm not a JS expert so maybe I'm missing something obvious. I would appreciate the review and a help from the community.

@DenisPalnitsky DenisPalnitsky changed the title K8s mode without PV [DRAFT] K8s mode without PV Aug 18, 2024
@DenisPalnitsky DenisPalnitsky marked this pull request as ready for review August 18, 2024 16:59
@DenisPalnitsky DenisPalnitsky requested review from a team as code owners August 18, 2024 16:59
@aavileli
Copy link

aavileli commented Apr 7, 2025

Would any GitHub Action runner maintainers be able to share an estimated timeline for addressing this issue?

@zarko-a
Copy link

zarko-a commented Apr 8, 2025

We are also interested in getting away from a shared volume between the runner and workflow pods. Would be great if this PR got reviewed and merged!

@nikola-jokic any chance this can move forward?

@nikola-jokic
Copy link
Collaborator

Hey everyone, this is definitely one of the things we will prioritize soon. This approach is interesting, but we want to try few more approaches before we land on the final solution. I'll keep you all updated

@aavileli
Copy link

@nikola-jokic Thanks for the update — appreciate you keeping us informed. Just wanted to mention that this is currently impacting our evaluation of GitHub Actions, so we're looking forward to seeing how the final direction shapes up.

@quoctruong
Copy link

@nikola-jokic Is there any update on this? Is this approach valid? I was trying to do something similar but wasn't sure if the volume needs to be kept in sync or just the initial copy of the volume during the prepare step is enough?

@nikola-jokic
Copy link
Collaborator

Hey @quoctruong,

We will work on this soon, but there are a few things we need to worry about:

  1. The filesystem and assets should be provided to the hook. I would like to experiment with adding a sidecar container that would provide the filesystem to the workflow container
  2. The files generated inside the image should be reflected back to the runner. Now I'm not 100% sure this must be done, but I need to test and ensure that writing env to $GITHUB_ENV works properly.

Using a sidecar would eliminate the copy, which would be great. But we need to test this approach and make sure we don't break anything.

@zarko-a
Copy link

zarko-a commented Apr 25, 2025

Thanks for the update @nikola-jokic . We are extremely interested in getting away from shared volumes. We've tried multiple RWX storage options and none perform well with large repos.

@quoctruong
Copy link

Thanks @nikola-jokic. Out of curiosity, how do you plan to solve the issue where the runner container is started up before even knowing about the workflow container? Since it's not possible to add a container to a running pod, this means even with a sidecar, all the containers have to be in the pod spec before they are started? So most likely this has to be done in the https://github.com/actions/runner code base and not in the hook?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants