Skip to content

ARC Runner Job step getting stuck until timeout or prematurely starting next Job step while the previous step process is still in progress #4020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ahamednijamudeen opened this issue Apr 4, 2025 · 1 comment

Comments

@ahamednijamudeen
Copy link

We are running Github ARC controller and Runner Helm chart version 0.10.1. in Azure Kubernetes v1.31.2. Running the ARC in Kubernetes Kind.

In our workflow, we have Job step with long running Postgres pg_dump process which can take couple of hours to complete due to size of the database. What I have been seeing when the Job is triggered is that workflow step stop printing any logs after few minutes even though the pg_dump process seems to still running in the workflow pod, sometime the step just hangs without any progress and times out after 6 hours even though the process within the workflow pod seems to have finished after some time.

in some cases, the JOB proceeds to the next step of clean up and pg_restore while the previous step process pg_dump is still in progress within the workflow pod.

Below process output from workflow POD when the pg_dump was running which show the process executed via shell script sh

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 19:00 ?        00:00:00 tail -f /dev/null
root          53       0  0 19:00 ?        00:00:00 sh -e /__w/_temp/1c5ed1b2-0120-4a68-8f24-21d4e6f64b98.sh
root          58      53  0 19:00 ?        00:00:00 sh ./.github/data_refresh/pg_dump_source_script.sh
root          59      58 97 19:00 ?        00:06:12 /usr/lib/postgresql/14/bin/pg_dump -Fc -v -v -- .........

After couple of minute, the Job proceed to the next step of clean up and eventually to pg_restore while the previous process pg_dump hasn't completed.. below is the process listing which shows both are active and somehow the Job Step determined pg_dump step is completed

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 19:00 ?        00:00:00 tail -f /dev/null
root          53       0  0 19:00 ?        00:00:00 sh -e /__w/_temp/1c5ed1b2-0120-4a68-8f24-21d4e6f64b98.sh
root          58      53  0 19:00 ?        00:00:00 sh ./.github/data_refresh/pg_dump_source_script.sh
root          59      58 98 19:00 ?        00:08:06 /usr/lib/postgresql/14/bin/pg_dump -Fc -v -v --........
root          88       0  0 19:08 ?        00:00:00 sh -e /__w/_temp/15b907eb-f6fe-4598-a025-960e39561b92.sh
root          94      88  4 19:08 ?        00:00:00 /usr/lib/postgresql/14/bin/pg_restore -Fc -v -v -- .....

Here's the TOP command from workflow pod that shows both process from different step are in progress . I see the CPU utilization of pg_dump is close to ~100% always

pg_dump process running close to 19 minute and pg_restore from different step running in parallel for ~2 minute.

top - 19:20:44 up 1 day, 14:26,  0 user,  load average: 2.33, 1.94, 1.70
Tasks:  10 total,   2 running,   8 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.9 us,  7.1 sy,  0.0 ni, 70.7 id,  0.0 wa,  0.0 hi,  2.2 si,  0.0 st 
MiB Mem :  32094.0 total,   4859.7 free,   4886.4 used,  22901.7 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  27207.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                               
     59 root      20   0   36340  29020  10284 R  99.0   0.1  19:16.22 pg_dump                                                                                                                                                                               
     94 root      20   0   22840  15752  10008 S  16.9   0.0   1:57.51 pg_restore                                                                                                                                                                            
      1 root      20   0    2728   1104   1012 S   0.0   0.0   0:00.07 tail                                                                                                                                                                                  

Below is the Runner POD Log during the step movement between pg_dump, clean_up and pg_restore

[RUNNER 2025-04-04 19:07:59Z INFO JobDispatcher] Successfully renew job request 5071, job is valid till 04/04/2025 19:17:59
[WORKER 2025-04-04 19:08:03Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:03Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:03Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:13Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:13Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:13Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:23Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:23Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:23Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] Finished process 105 with exit code 0, and elapsed time 00:08:09.3950883.
[WORKER 2025-04-04 19:08:30Z INFO ContainerHookManager] Response file for the hook script at '/home/runner/k8s/index.js' running command 'RunScriptStep' was processed successfully
[WORKER 2025-04-04 19:08:30Z INFO ContainerHookManager] Response file for the hook script at '/home/runner/k8s/index.js' running command 'RunScriptStep' was deleted successfully
[WORKER 2025-04-04 19:08:30Z INFO CreateStepSummaryCommand] Step Summary file (/home/runner/_work/_temp/_runner_file_commands/step_summary_74804f45-798a-46eb-8697-7cd9566b072f) is empty; skipping attachment upload
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] Step result: 
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext] Publish step telemetry for current step {
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "action": "Pre Job Hook",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "type": "runner",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "stage": "Main",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "stepId": "7beb627d-e27d-5af3-c5ac-34ac86e346de",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "stepContextName": "pg_dump",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "result": "succeeded",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "errorMessages": [],
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "executionTimeInSeconds": 490,
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "startTime": "2025-04-04T19:00:21.4828153Z",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "finishTime": "2025-04-04T19:08:30.9252574Z",
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext]   "containerHookData": "{\"hookScriptPath\":\"/home/runner/k8s/index.js\"}"
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext] }.
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] No need for updating job result with current step result 'Succeeded'.
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] Current state: job state = ''
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] Processing step: DisplayName='clean up target schema '
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] Evaluating: success()
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] Result: true
[WORKER 2025-04-04 19:08:30Z INFO StepsRunner] Starting the step.
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:30Z INFO ExecutionContext] Write event payload to /home/runner/_work/_temp/_github_workflow/event.json
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:30Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IFileCommandExtension'
[WORKER 2025-04-04 19:08:30Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IActionCommandExtension'
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command internal-set-repo-path
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command set-env
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command set-output
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command save-state
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command add-path
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command add-mask
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command add-matcher
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command remove-matcher
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command warning
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command error
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command notice
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command debug
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command group
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command endgroup
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command echo
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:30Z INFO ScriptHandler] Which2: 'sh'
[WORKER 2025-04-04 19:08:30Z INFO ScriptHandler] Location: '/usr/bin/sh'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Externals': '/home/runner/externals'
[WORKER 2025-04-04 19:08:30Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IActionCommandExtension'
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command internal-set-repo-path
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command set-env
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command set-output
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command save-state
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command add-path
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command add-mask
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command add-matcher
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command remove-matcher
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command warning
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command error
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command notice
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command debug
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command group
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command endgroup
[WORKER 2025-04-04 19:08:30Z INFO ActionCommandManager] Register action command extension for command echo
[WORKER 2025-04-04 19:08:30Z INFO ScriptHandler] Which2: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:30Z INFO ScriptHandler] Fully qualified path: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:30Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:30Z INFO ScriptHandler] Which2: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:30Z INFO ScriptHandler] Fully qualified path: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] Starting process:
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   File name: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Arguments: '/home/runner/k8s/index.js'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Working directory: '/home/runner/_work/....-refresh/...-refresh'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'True'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Persist current code page: 'False'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 183.
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] System.UnauthorizedAccessException: Access to the path '/proc/183/oom_score_adj' is denied.
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]  ---> System.IO.IOException: Permission denied
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]    --- End of inner exception stack trace ---
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]    at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]    at System.IO.File.WriteToFile(String path, FileMode mode, String contents, Encoding encoding)
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper]    at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] Process started with process id 183, waiting for process exit.
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] Close STDIN after the first redirect finished.
[WORKER 2025-04-04 19:08:30Z INFO ProcessInvokerWrapper] STDIN stream write finished.
[WORKER 2025-04-04 19:08:31Z INFO JobServerQueue] Try to append 1 batches web console lines for record '7beb627d-e27d-5af3-c5ac-34ac86e346de', success rate: 1/1.
[WORKER 2025-04-04 19:08:31Z INFO JobServerQueue] Try to append 2 batches web console lines for record 'b6cefd9c-56c9-5379-a681-47c38ba35eb9', success rate: 2/2.
[WORKER 2025-04-04 19:08:31Z INFO JobServerQueue] Got a step log file to send to results service.
[WORKER 2025-04-04 19:08:31Z INFO JobServerQueue] Starting upload of step log file to results service ResultsLog, /home/runner/_diag/blocks/1aaba5ef-4c41-4b70-bfaa-72402f48c908_7beb627d-e27d-5af3-c5ac-34ac86e346de.1
[WORKER 2025-04-04 19:08:32Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'b6cefd9c-56c9-5379-a681-47c38ba35eb9', success rate: 1/1.
[WORKER 2025-04-04 19:08:32Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[WORKER 2025-04-04 19:08:32Z INFO JobServerQueue] Try to append 2 batches web console lines for record 'b6cefd9c-56c9-5379-a681-47c38ba35eb9', success rate: 2/2.
[WORKER 2025-04-04 19:08:32Z INFO JobServerQueue] Tried to upload 1 file(s) to results, success rate: 1/1.
[WORKER 2025-04-04 19:08:33Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:33Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:33Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] Finished process 183 with exit code 0, and elapsed time 00:00:03.1618170.
[WORKER 2025-04-04 19:08:34Z INFO ContainerHookManager] Response file for the hook script at '/home/runner/k8s/index.js' running command 'RunScriptStep' was processed successfully
[WORKER 2025-04-04 19:08:34Z INFO ContainerHookManager] Response file for the hook script at '/home/runner/k8s/index.js' running command 'RunScriptStep' was deleted successfully
[WORKER 2025-04-04 19:08:34Z INFO CreateStepSummaryCommand] Step Summary file (/home/runner/_work/_temp/_runner_file_commands/step_summary_36d18cce-1e29-401f-b9f1-c75f71749395) is empty; skipping attachment upload
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] Step result: 
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext] Publish step telemetry for current step {
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "action": "Pre Job Hook",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "type": "runner",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "stage": "Main",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "stepId": "b6cefd9c-56c9-5379-a681-47c38ba35eb9",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "stepContextName": "clean_up_schema",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "result": "succeeded",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "errorMessages": [],
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "executionTimeInSeconds": 4,
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "startTime": "2025-04-04T19:08:30.9272869Z",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "finishTime": "2025-04-04T19:08:34.1268141Z",
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext]   "containerHookData": "{\"hookScriptPath\":\"/home/runner/k8s/index.js\"}"
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext] }.
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] No need for updating job result with current step result 'Succeeded'.
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] Current state: job state = ''
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] Processing step: DisplayName='PG restore schema into Obfuscation Database'
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] Evaluating: success()
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] Result: true
[WORKER 2025-04-04 19:08:34Z INFO StepsRunner] Starting the step.
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:34Z INFO ExecutionContext] Write event payload to /home/runner/_work/_temp/_github_workflow/event.json
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:34Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IFileCommandExtension'
[WORKER 2025-04-04 19:08:34Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IActionCommandExtension'
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command internal-set-repo-path
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command set-env
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command set-output
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command save-state
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command add-path
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command add-mask
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command add-matcher
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command remove-matcher
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command warning
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command error
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command notice
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command debug
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command group
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command endgroup
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command echo
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:34Z INFO ScriptHandler] Which2: 'sh'
[WORKER 2025-04-04 19:08:34Z INFO ScriptHandler] Location: '/usr/bin/sh'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Externals': '/home/runner/externals'
[WORKER 2025-04-04 19:08:34Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IActionCommandExtension'
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command internal-set-repo-path
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command set-env
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command set-output
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command save-state
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command add-path
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command add-mask
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command add-matcher
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command remove-matcher
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command warning
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command error
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command notice
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command debug
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command group
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command endgroup
[WORKER 2025-04-04 19:08:34Z INFO ActionCommandManager] Register action command extension for command echo
[WORKER 2025-04-04 19:08:34Z INFO ScriptHandler] Which2: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:34Z INFO ScriptHandler] Fully qualified path: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:34Z INFO HostContext] Well known directory 'Temp': '/home/runner/_work/_temp'
[WORKER 2025-04-04 19:08:34Z INFO ScriptHandler] Which2: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:34Z INFO ScriptHandler] Fully qualified path: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] Starting process:
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   File name: '/home/runner/externals/node20/bin/node'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Arguments: '/home/runner/k8s/index.js'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Working directory: '/home/runner/_work/....-refresh/....-refresh'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'True'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Persist current code page: 'False'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 194.
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] System.UnauthorizedAccessException: Access to the path '/proc/194/oom_score_adj' is denied.
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]  ---> System.IO.IOException: Permission denied
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]    --- End of inner exception stack trace ---
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]    at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]    at System.IO.File.WriteToFile(String path, FileMode mode, String contents, Encoding encoding)
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper]    at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] Close STDIN after the first redirect finished.
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] STDIN stream write finished.
[WORKER 2025-04-04 19:08:34Z INFO ProcessInvokerWrapper] Process started with process id 194, waiting for process exit.
[WORKER 2025-04-04 19:08:34Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[WORKER 2025-04-04 19:08:34Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'b6cefd9c-56c9-5379-a681-47c38ba35eb9', success rate: 1/1.
[WORKER 2025-04-04 19:08:34Z INFO JobServerQueue] Try to append 2 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 2/2.
[WORKER 2025-04-04 19:08:34Z INFO JobServerQueue] Got a step log file to send to results service.
[WORKER 2025-04-04 19:08:34Z INFO JobServerQueue] Starting upload of step log file to results service ResultsLog, /home/runner/_diag/blocks/1aaba5ef-4c41-4b70-bfaa-72402f48c908_b6cefd9c-56c9-5379-a681-47c38ba35eb9.1
[WORKER 2025-04-04 19:08:35Z INFO JobServerQueue] Tried to upload 1 file(s) to results, success rate: 1/1.
[WORKER 2025-04-04 19:08:35Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:35Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 6/6.
[WORKER 2025-04-04 19:08:36Z INFO JobServerQueue] Try to append 6 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 6/6.
[WORKER 2025-04-04 19:08:36Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:37Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:37Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:39Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:39Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:40Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:40Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:41Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:41Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:42Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:42Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:43Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:43Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:43Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:43Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:43Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:08:44Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:44Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:45Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:45Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:46Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:47Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:47Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:48Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:48Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:49Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:50Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:52Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:53Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:08:53Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:08:53Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:08:53Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[RUNNER 2025-04-04 19:08:59Z INFO JobDispatcher] Successfully renew job request 5071, job is valid till 04/04/2025 19:18:59
[WORKER 2025-04-04 19:09:03Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:09:03Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:09:03Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:09:13Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:09:13Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:09:13Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:09:23Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:09:23Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:09:23Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:09:33Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:09:33Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:09:33Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:09:36Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:09:37Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:09:37Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:09:38Z INFO JobServerQueue] Try to append 1 batches web console lines for record 'f9fe1c6e-4a89-502a-435b-d7f76b442914', success rate: 1/1.
[WORKER 2025-04-04 19:09:43Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:09:43Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:09:43Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[WORKER 2025-04-04 19:09:53Z INFO HostContext] Well known directory 'Bin': '/home/runner/bin'
[WORKER 2025-04-04 19:09:53Z INFO HostContext] Well known directory 'Root': '/home/runner'
[WORKER 2025-04-04 19:09:53Z INFO HostContext] Well known directory 'Work': '/home/runner/_work'
[RUNNER 2025-04-04 19:09:59Z INFO JobDispatcher] Successfully renew job request 5071, job is valid till 04/04/2025 19:19:59
....

Below is our Runner POD spec. we are using Flux CD to sync in our environment

  values:
    ...
    minRunners: 0
    maxRunners: 6
    .....
    containerMode:
      type: kubernetes
      kubernetesModeWorkVolumeClaim:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "gha-runner-storage"
        resources:
         requests:
           storage: 50Gi
    template:
      spec:
        securityContext:
          runAsUser: 1001
          runAsGroup: 123
          fsGroup: 123
        containers:
        - name: runner
          image: ghcr.io/actions/actions-runner:latest
          command: [ "/home/runner/run.sh" ]
          env:
            - name: ACTIONS_RUNNER_CONTAINER_HOOKS
              value: /home/runner/k8s/index.js
            - name: ACTIONS_RUNNER_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
              value: "true"
            - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
              value: /home/runner/pod-template/content
          volumeMounts:
            - name: work
              mountPath: /home/runner/_work
            - name: pod-template
              mountPath: /home/runner/pod-template
              readOnly: true
        volumes:
          - name: pod-template
            configMap:
              name: cpu-hook-extension

Originally posted by @ahamednijamudeen in #4019

Copy link
Contributor

github-actions bot commented Apr 4, 2025

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@ahamednijamudeen ahamednijamudeen changed the title We are running Github ARC controller and Runner Helm chart version 0.10.1. in Azure Kubernetes v1.31.2. Running the ARC in Kubernetes Kind. ARC Runner Job step getting stuck until timeout or prematurely starting next Job step while the previous step process is still in progress Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant