Description
Something went wrong, and all of our self-hosted runners checked out bad .git
folders or somehow corrupted them. It happened on around 13 of our runners at the same time. I think it was a random occurrence, because I had to manually login and delete the repository folder, and then it was fine.
Here are our logs:
2023-01-30T02:56:34.9249114Z Waiting for a runner to pick up this job...
2023-01-30T04:54:24.3969588Z Job is about to start running on the runner: XXXXXXXXXXXXXXXXXXXXXXXX (organization)
2023-01-30T04:54:29.3070556Z Current runner version: '2.301.1'
2023-01-30T04:54:29.3077744Z Runner name: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
2023-01-30T04:54:29.3078128Z Runner group name: 'Default'
2023-01-30T04:54:29.3078642Z Machine name: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
2023-01-30T04:54:29.3080746Z ##[group]GITHUB_TOKEN Permissions
2023-01-30T04:54:29.3081343Z Actions: write
2023-01-30T04:54:29.3081520Z Checks: write
2023-01-30T04:54:29.3081693Z Contents: write
2023-01-30T04:54:29.3081906Z Deployments: write
2023-01-30T04:54:29.3082186Z Discussions: write
2023-01-30T04:54:29.3082429Z Issues: write
2023-01-30T04:54:29.3082608Z Metadata: read
2023-01-30T04:54:29.3082779Z Packages: write
2023-01-30T04:54:29.3082958Z Pages: write
2023-01-30T04:54:29.3083147Z PullRequests: write
2023-01-30T04:54:29.3083476Z RepositoryProjects: write
2023-01-30T04:54:29.3083696Z SecurityEvents: write
2023-01-30T04:54:29.3083888Z Statuses: write
2023-01-30T04:54:29.3084056Z ##[endgroup]
2023-01-30T04:54:29.3087171Z Secret source: Actions
2023-01-30T04:54:29.3087569Z Prepare workflow directory
2023-01-30T04:54:29.4388409Z Prepare all required actions
2023-01-30T04:54:29.4550014Z Getting action download info
2023-01-30T04:54:29.8524043Z Download action repository 'actions/checkout@v3' (SHA:ac593985615ec2ede58e132d2e21d2b1cbd6127c)
2023-01-30T04:54:30.9083915Z Complete job name: XXXXXXXXXXXXXXXXXXXXXXXX
2023-01-30T04:54:31.0985565Z ##[group]Run actions/checkout@v3
2023-01-30T04:54:31.0985877Z with:
2023-01-30T04:54:31.0986059Z repository: XXXXXXXX/XXXXXXXX
2023-01-30T04:54:31.0986462Z token: ***
2023-01-30T04:54:31.0986609Z ssh-strict: true
2023-01-30T04:54:31.0986786Z persist-credentials: true
2023-01-30T04:54:31.0986951Z clean: true
2023-01-30T04:54:31.0987092Z fetch-depth: 1
2023-01-30T04:54:31.0987234Z lfs: false
2023-01-30T04:54:31.0987377Z submodules: false
2023-01-30T04:54:31.0987547Z set-safe-directory: true
2023-01-30T04:54:31.0987702Z env:
2023-01-30T04:54:31.0987887Z TMP: C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX/.temp
2023-01-30T04:54:31.0988151Z TEMP: C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX/.temp
2023-01-30T04:54:31.0988398Z TMPDIR: C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX/.temp
2023-01-30T04:54:31.0988665Z MATLAB_PREFDIR: C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX/.preferences
2023-01-30T04:54:31.0988870Z ##[endgroup]
2023-01-30T04:54:34.6968863Z Syncing repository: XXXXXXXX/XXXXXXXX
2023-01-30T04:54:34.6970512Z ##[group]Getting Git version info
2023-01-30T04:54:34.6970936Z Working directory is 'C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX'
2023-01-30T04:54:34.6971402Z [command]"C:\Program Files\Git\cmd\git.exe" version
2023-01-30T04:54:34.7493487Z git version 2.36.1.windows.1
2023-01-30T04:54:34.7592122Z ##[endgroup]
2023-01-30T04:54:34.7607048Z Temporarily overriding HOME='C:\runner\e595c9b9\_work\_temp\bcafa367-f8cb-4d31-84b1-63d10aaaabed' before making global git config changes
2023-01-30T04:54:34.7607516Z Adding repository directory to the temporary git global config as a safe directory
2023-01-30T04:54:34.7608114Z [command]"C:\Program Files\Git\cmd\git.exe" config --global --add safe.directory C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX
2023-01-30T04:54:34.8483251Z [command]"C:\Program Files\Git\cmd\git.exe" config --local --get remote.origin.url
2023-01-30T04:54:34.8992096Z ##[error]fatal: --local can only be used inside a git repository
2023-01-30T04:54:34.9013542Z Deleting the contents of 'C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX'
2023-01-30T04:54:35.0573716Z ##[error]EPERM: operation not permitted, unlink 'C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX\.git'
2023-01-30T04:54:35.4710729Z Post job cleanup.
2023-01-30T04:54:38.8875206Z Cleaning up orphan processes
In this case, checkout seems to be bailing fatally, i.e. after the error fatal: --local can only be used inside a git repository
, the actions run ends immediately with a fault and won't try and continue.
This effectively bricked the runner because any jobs that the bad runner would pick up would fail instantly. Not only that, but the bad runner would take all the jobs in the queue and virtually instantly fail them, which messed up our job history quite a bit unfortunately.
Since the resolution step was simply to login and delete the offending bad folder, it would be nice if it would automatically nuke away the folder and retry once.
It seems like it tried this:
2023-01-30T04:54:34.9013542Z Deleting the contents of 'C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX'
2023-01-30T04:54:35.0573716Z ##[error]EPERM: operation not permitted, unlink 'C:\runner\e595c9b9\_work\XXXXXXXX\XXXXXXXX\.git'
I am not sure why that didn't work, since I was able to login and just rm
the folder fine as the same user. In any case, all 13 runners failed to delete the folder automatically.
To reproduce, I would suggest:
- Install self hosted runner on Windows Server 2022 running as a service and using a non-admin service user (i.e. Bob)
- Setup action to checkout repository
- Manually corrupt the .git folder by adding extra random files into it (?)
- Ensure
git config --local --get remote.origin.url
fails - Observe consequent jobs acquired by this runner will fail instantly and it will fail to recover