Description
What happened?
Hello,
We are experiencing inconsistent, occasional crashes in our pipelines. The error always says:
An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
We have yaml pipelines with a few stages. These pipelines are mostly shared (through git) and run hundreds of times daily over multiple projects and repos. Most of the time they work correctly, but sometimes they crash. Maybe once in 10 to 100 runs, it is hard to tell an exact number.
I am not sure if it is a problem of a task or an agent, but the same issue occurs in different tasks, and on top of that it reports missing nodejs (which we are not using in that task) and a version of nodejs we are not using anywhere, I am leaning toward it being a problem with an agent or some other part of the system.
When it happens, it appears to always happen in the first task in a stage (not counting standard checkout
tasks). We see in happening in:
- script: |
echo $PATH
node --version
displayName: 'Node.js info'
continueOnError: true
jobs:
- job: CheckPullRequest
displayName: 'Check pull request'
pool:
name: pool-name-foo
vmImage: ubuntu-latest
workspace:
clean: all
steps:
- download: none
- checkout: self
path: sources
fetchDepth: 0
clean: true
persistCredentials: true
- checkout: SharedPipelinesRepository
path: shared-pipelines
fetchDepth: 1
- script: |
echo $PATH
node --version
displayName: 'Node.js info'
continueOnError: true
- task: UseNode@1
displayName: 'Install Node.js'
inputs:
version: '20.18.0'
- task: AzureKeyVault@2
inputs:
connectedServiceName: 'foo'
keyVaultName: 'bar'
runAsPreJob: true
secretsFilter: 'secret'
We even tried to create a separate agent pool and move a single pipeline to it (so we could rule out some different pipeline breaking the environments) but it happened anyway. That pipeline run 3 parallel stages, first one had node, second didn't, and the third had it available again. That is wild.
Those stages all had task: UseNode@1
as a first task (except checkout
tasks). Here are timestamps from those UseNode tasks:
Stage A, Job A
2025-03-05T06:28:21.9568752Z Found tool in cache: node 20.18.0 x64
Stage B, Job B
2025-03-05T06:28:42.8821219Z ##[error]An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
Stage C, Job AC
2025-03-05T06:29:06.4968152Z Found tool in cache: node 20.18.0 x64
We are trying to create a repro, but so far no luck. This is not happening on MS hosted agents.
Could you give us some hint what could be the root cause? Why is nodejs 20.1 needed to run - script: echo $PATH
? Is it used by the system itself? What could cause it sometimes not being present?
I can add more info if needed.
Thank for any help!
Versions
Current agent version: '4.251.0'
Agent running as: 'AzDevOps'
Image: ubuntu-latest
Environment type (Please select at least one enviroment where you face this issue)
- Self-HostedMicrosoft HostedVMSS PoolContainer
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
No response
Operation system
No response
Version controll system
Git
Relevant log output
2025-03-11T00:16:33.8605603Z ##[section]Starting: Node.js info
2025-03-11T00:16:33.8621208Z ==============================================================================
2025-03-11T00:16:33.8621756Z Task : Command line
2025-03-11T00:16:33.8622069Z Description : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2025-03-11T00:16:33.8622609Z Version : 2.250.1
2025-03-11T00:16:33.8622928Z Author : Microsoft Corporation
2025-03-11T00:16:33.8623299Z Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2025-03-11T00:16:33.8623801Z ==============================================================================
2025-03-11T00:16:34.0871007Z ##[error]An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
2025-03-11T00:16:34.0890327Z ##[section]Finishing: Node.js info
2025-03-11T00:16:34.0936432Z ##[section]Starting: Install Node.js
2025-03-11T00:16:34.0945645Z ==============================================================================
2025-03-11T00:16:34.0945964Z Task : Use Node.js ecosystem
2025-03-11T00:16:34.0946154Z Description : Set up a Node.js environment and add it to the PATH, additionally providing proxy support
2025-03-11T00:16:34.0946482Z Version : 1.248.1
2025-03-11T00:16:34.0946659Z Author : Microsoft Corporation
2025-03-11T00:16:34.0946875Z Help : https://docs.microsoft.com/azure/devops/pipelines/tasks
2025-03-11T00:16:34.0947120Z ==============================================================================
2025-03-11T00:16:34.2777399Z ##[error]An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
2025-03-11T00:16:34.2787536Z ##[section]Finishing: Install Node.js
Activity
ecl1ps commentedon Mar 27, 2025
Hi, I have an update.
This issue has been occurring since January, at least. I did another pass at an analysis of these crashes. I saw the last occurrence of
An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
on the 3rd of March.But, we started to see a different text of the issue. It happens in the same exact places and breaks pipelines the same way (described in the post above). The new error message is
##[error]StandardOut has not been redirected or the process hasn't started yet.
. Not sure if that gives you a clue on what is wrong.Our build agents are running on an image built on 17.2.2025. There hasn't been any change to them since. The version of all "problematic" pipeline tasks is the same as well.
Do you have any guidance?
christhebatchelor commentedon Mar 27, 2025
I noticed this issue on a self-hosted agent today. On a whim, I simply copied the existing node20 folder to create a node20_1 folder. The pipeline then ran without issue. I don't know that I necessarily endorse this approach for a production system, but if you're self-hosted, and just need to move forward maybe give that a try as a workaround.
ecl1ps commentedon Mar 28, 2025
Here is the debug log of the failed
script
task:ecl1ps commentedon Apr 9, 2025
@tarunramsinghani Is there something we could try to narrow this down?
tarunramsinghani commentedon Apr 10, 2025
@ecl1ps, can you please help with what is agent version you are using ? and the OS and configuration(ram, storage etc) of the agent machine ? As it seems like something on OS is stopping the node process from getting started. I dont believe the cause is that the file is missing as mentoined in the initial comment as agent ships with node20_1.
jiribaloncz commentedon Apr 15, 2025
Thanks for looking into this.
Allow me to answer your questions and also provide some additional details that might help with the investigation.
We’re using an Azure DevOps Agent Pool of type Azure Virtual Machine Scale Set.
Agent version is 4.252.0 (latest at the time of writing).
The VMSS behind the agent pool is configured as follows:
VM size: D2as_v4
RAM: 8 GB
vCPUs: 2
Local temp storage: 16 GB (SCSI)
OS disk: 128 GB Premium SSD (LRS), 500 IOPS
OS: Ubuntu 20.04 (sysprepped custom image from Azure Compute Gallery)
The issue appears intermittently. When I run a minimal YAML pipeline repeatedly on the same agent pool and image version, the problem does not reproduce. However, in actual production pipelines, the error does occur.
Given that the agent ships with node20_1, I also don’t believe the root cause is a missing file.
Could it be possible that something is modifying or misconfiguring the environment at runtime, causing the Node process to fail to start?
Any guidance or insight would be appreciated — let me know if further logs or debug output would help.
Thank you
tarunramsinghani commentedon Apr 16, 2025
Hmmm the disk space seems okay in this case. Would it be possible to collect and share diagnostics logs when this reproes again. To enable and collection diagnostic logs for agent please check this - Configure verbose logs
ecl1ps commentedon Apr 17, 2025
Here are logs of yesterday's failed run featuring
StandardOut has not been redirected or the process hasn't started yet.
logs_1589538.zip
tarunramsinghani commentedon Apr 18, 2025
I see that AgentDiagnistic logs are missing, can you please enable the below option to make sure it will collect agent diagnostic logs.
Also seems like the Error
"StandardOut has not been redirected or the process hasn't started yet."
is masking the actual issue somewhere else. I still believe the original issue is the initial error you mentioned. Is it possible to repro this with 4.251.0 agent and with all diagnostic logs enabled and sysm.debug set to true as well please. that will be helpful in further debugging. Also can you please check if this reproes with any specific agent in the pool as I suspect there is some issue with FS which is causing the node command to fail. Maybe delete that specific VM if that is the case ?ecl1ps commentedon Apr 30, 2025
The logs I posted above are everything that was generated when I enabled the system diagnostic by using
The Agent Diagnostics logs are just not there. I don't see a way to enable them other than
system.debug
. Or am I reading it wrong? https://learn.microsoft.com/en-us/azure/devops/pipelines/troubleshooting/review-logs?view=azure-devops&tabs=windows-agent#agent-diagnostic-logsWe are not using VMs but Scale Sets, I don't think the same applies to them.
k290 commentedon May 22, 2025
For what its worth OPs issue isn't isolated. We have had the same error intermittently on agent version 4.252. OS is Windows for us
Jezour1sw commentedon May 27, 2025
We're experiencing the same issue in our environment. We're also using Ubuntu-based agents running in a VM Scale Set. The error occurs consistently when the same Key Vault library is referenced in multiple jobs within the same pipeline.
Let me know if there's any recommended workaround or fix being worked on.
Jezour1sw commentedon May 27, 2025
FIX: Tasks on MacOS agents not completing but marked as succeeded. (#…
German-Alcaor commentedon Jun 17, 2025
Although the Microsoft documentation doesn't list Node.js as a requirement, I found that installing it (Node version 24.2.0) on my self-hosted agent resolved the issue for me. Sharing here in case it helps others.