Open
Description
Describe the bug
safe_sleep.sh is built as while loop aiming to be safer than count on sleep or ping binaries. While loop the way function was implemented leads to 100% CPU usage without try another alternatives (Eg. Testing if sleep is available).
To Reproduce
Steps to reproduce the behavior:
- Start runner
- Wait
Expected behavior
A lesser CPU usage from runner while idle.
Actual behavior
Runner Version and Platform
Linux 2.300.2 running on Docker.
Activity
fhammerl commentedon Feb 1, 2023
Thanks @leleobhz, labeled the issue.
good-first-issue
keep
This barebones implementation for
safe_sleep.sh
was chosen for its broad availability on most platforms.A better 'safe sleep' must
sleep
orping
)A good start would be to check for the existence of such utils before resorting to the above CPU intensive counting
LiranBarton commentedon May 7, 2023
Hey @fhammerl, was this bug fixed?
We have runners in ASG that scales to MAX because of this script.
Any suggestions ? (did not fully understand why we need this script - can we remove it on boot?)
leleobhz commentedon May 8, 2023
@LiranBarton hello!
This script as far I understood the reasons aims to have a minimal
sleep()
implementation for environments that does'nt have one using pure shell.Main questions here is this implementation lead's to CPU ocupation and may be uses only if sleep are not present on system, as last fallback.
I suggest this way to handle sleep() issues if it's really needed (But personally I can't see why sleep does need a pure-shell implementation here since we have both bash or busybox that someway implement this function. Also, maybe embed busybox a better solution instead custom implementation).
tlhakhan commentedon Jun 25, 2023
I also noticed high CPU utilization on my nodes as well, and it ended up being the
safe_sleep.sh
script. See below of a snippet of mytop
output.I modified
safe_sleep.sh
script to usesleep
instead. After the small change, I see lower CPU utilization on my nodes.I overwrote the
safe_sleep.sh
in my actions-runner image:Edit 2023-06-25:

Below is the difference on my hypervisor CPU utilization after the pushing the
safe_sleep.sh
override on all my worker nodes.I haven't noticed any issues with my runners after these changes (🤞).

tlhakhan commentedon Jul 3, 2023
Just a note, the above fix wasn't permanent because the actions runner seems to automatically update over time and pulls the original
safe_sleep.sh
implementation.A snippet of the runner logs, where it performs a update from 2.299.1 to 2.305.0.
My
top
output:Edit:
I found the following blog post that stops the self-update on the runners.
https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/.
taliastocks commentedon Mar 21, 2024
There are already a bunch of binaries bundled with the runner, e.g.
bin/Runner.Listener
. If this is really a huge cross-platform concern, can't it be solved by distributing something likebin/Sleep
to implement sleep? Rather than a bash busy loop?