Skip to content

[BUG]: An error occurred trying to start process '/agent/externals/node20_1/bin/node' #5151

Open
@ecl1ps

Description

@ecl1ps

What happened?

Hello,

We are experiencing inconsistent, occasional crashes in our pipelines. The error always says:

An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory

We have yaml pipelines with a few stages. These pipelines are mostly shared (through git) and run hundreds of times daily over multiple projects and repos. Most of the time they work correctly, but sometimes they crash. Maybe once in 10 to 100 runs, it is hard to tell an exact number.

I am not sure if it is a problem of a task or an agent, but the same issue occurs in different tasks, and on top of that it reports missing nodejs (which we are not using in that task) and a version of nodejs we are not using anywhere, I am leaning toward it being a problem with an agent or some other part of the system.

When it happens, it appears to always happen in the first task in a stage (not counting standard checkout tasks). We see in happening in:

  - script: |
      echo $PATH
      node --version
    displayName: 'Node.js info'
    continueOnError: true

Image

jobs:
  - job: CheckPullRequest
    displayName: 'Check pull request'
    pool:
      name: pool-name-foo
      vmImage: ubuntu-latest
      workspace:
        clean: all
    steps:
      - download: none

      - checkout: self
        path: sources
        fetchDepth: 0
        clean: true
        persistCredentials: true

      - checkout: SharedPipelinesRepository
        path: shared-pipelines
        fetchDepth: 1
        
      - script: |
          echo $PATH
          node --version
        displayName: 'Node.js info'
        continueOnError: true

      - task: UseNode@1
        displayName: 'Install Node.js'
        inputs:
          version: '20.18.0'

Image

      - task: AzureKeyVault@2
        inputs:
          connectedServiceName: 'foo'
          keyVaultName: 'bar'
          runAsPreJob: true
          secretsFilter: 'secret'

Image

We even tried to create a separate agent pool and move a single pipeline to it (so we could rule out some different pipeline breaking the environments) but it happened anyway. That pipeline run 3 parallel stages, first one had node, second didn't, and the third had it available again. That is wild.

Those stages all had task: UseNode@1 as a first task (except checkout tasks). Here are timestamps from those UseNode tasks:

Stage A, Job A
2025-03-05T06:28:21.9568752Z Found tool in cache: node 20.18.0 x64
 
Stage B, Job B
2025-03-05T06:28:42.8821219Z ##[error]An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
 
Stage C, Job AC
2025-03-05T06:29:06.4968152Z Found tool in cache: node 20.18.0 x64

We are trying to create a repro, but so far no luck. This is not happening on MS hosted agents.

Could you give us some hint what could be the root cause? Why is nodejs 20.1 needed to run - script: echo $PATH? Is it used by the system itself? What could cause it sometimes not being present?

I can add more info if needed.

Thank for any help!

Versions

Current agent version: '4.251.0'
Agent running as: 'AzDevOps'
Image: ubuntu-latest

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
    Microsoft Hosted
    VMSS Pool
    Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

No response

Version controll system

Git

Relevant log output

2025-03-11T00:16:33.8605603Z ##[section]Starting: Node.js info
2025-03-11T00:16:33.8621208Z ==============================================================================
2025-03-11T00:16:33.8621756Z Task         : Command line
2025-03-11T00:16:33.8622069Z Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2025-03-11T00:16:33.8622609Z Version      : 2.250.1
2025-03-11T00:16:33.8622928Z Author       : Microsoft Corporation
2025-03-11T00:16:33.8623299Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2025-03-11T00:16:33.8623801Z ==============================================================================
2025-03-11T00:16:34.0871007Z ##[error]An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
2025-03-11T00:16:34.0890327Z ##[section]Finishing: Node.js info


2025-03-11T00:16:34.0936432Z ##[section]Starting: Install Node.js
2025-03-11T00:16:34.0945645Z ==============================================================================
2025-03-11T00:16:34.0945964Z Task         : Use Node.js ecosystem
2025-03-11T00:16:34.0946154Z Description  : Set up a Node.js environment and add it to the PATH, additionally providing proxy support
2025-03-11T00:16:34.0946482Z Version      : 1.248.1
2025-03-11T00:16:34.0946659Z Author       : Microsoft Corporation
2025-03-11T00:16:34.0946875Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks
2025-03-11T00:16:34.0947120Z ==============================================================================
2025-03-11T00:16:34.2777399Z ##[error]An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory
2025-03-11T00:16:34.2787536Z ##[section]Finishing: Install Node.js

Activity

ecl1ps

ecl1ps commented on Mar 27, 2025

@ecl1ps
Author

Hi, I have an update.

This issue has been occurring since January, at least. I did another pass at an analysis of these crashes. I saw the last occurrence of An error occurred trying to start process '/agent/externals/node20_1/bin/node' with working directory '/agent/_work/1/s'. No such file or directory on the 3rd of March.

But, we started to see a different text of the issue. It happens in the same exact places and breaks pipelines the same way (described in the post above). The new error message is ##[error]StandardOut has not been redirected or the process hasn't started yet.. Not sure if that gives you a clue on what is wrong.

Our build agents are running on an image built on 17.2.2025. There hasn't been any change to them since. The version of all "problematic" pipeline tasks is the same as well.

Do you have any guidance?

Image

Image

Image

Image

christhebatchelor

christhebatchelor commented on Mar 27, 2025

@christhebatchelor

I noticed this issue on a self-hosted agent today. On a whim, I simply copied the existing node20 folder to create a node20_1 folder. The pipeline then ran without issue. I don't know that I necessarily endorse this approach for a production system, but if you're self-hosted, and just need to move forward maybe give that a try as a workaround.

ecl1ps

ecl1ps commented on Mar 28, 2025

@ecl1ps
Author

Here is the debug log of the failed script task:

2025-03-28T00:21:13.7232748Z ==============================================================================
2025-03-28T00:21:13.7232841Z Task         : Command line
2025-03-28T00:21:13.7232893Z Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2025-03-28T00:21:13.7233100Z Version      : 2.250.1
2025-03-28T00:21:13.7233159Z Author       : Microsoft Corporation
2025-03-28T00:21:13.7233224Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2025-03-28T00:21:13.7233307Z ==============================================================================
2025-03-28T00:21:13.8372243Z ##[debug]Using node path: /agent/externals/node20_1/bin/node
2025-03-28T00:21:13.8532074Z ##[error]StandardOut has not been redirected or the process hasn't started yet.
2025-03-28T00:21:13.8540765Z ##[debug]System.InvalidOperationException: StandardOut has not been redirected or the process hasn't started yet.
   at System.Diagnostics.Process.get_StandardOutput()
   at Microsoft.VisualStudio.Services.Agent.Util.ProcessInvoker.Dispose(Boolean disposing) in /mnt/vss/_work/1/s/src/Agent.Sdk/ProcessInvoker.cs:line 408
   at Microsoft.VisualStudio.Services.Agent.Util.ProcessInvoker.Dispose() in /mnt/vss/_work/1/s/src/Agent.Sdk/ProcessInvoker.cs:line 397
   at Microsoft.VisualStudio.Services.Agent.ProcessInvokerWrapper.Dispose(Boolean disposing) in /mnt/vss/_work/1/s/src/Microsoft.VisualStudio.Services.Agent/ProcessInvoker.cs:line 357
   at Microsoft.VisualStudio.Services.Agent.ProcessInvokerWrapper.Dispose() in /mnt/vss/_work/1/s/src/Microsoft.VisualStudio.Services.Agent/ProcessInvoker.cs:line 347
   at Microsoft.VisualStudio.Services.Agent.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, Boolean continueAfterCancelProcessTreeKillAttempt, TimeSpan sigintTimeout, TimeSpan sigtermTimeout, Boolean useGracefulShutdown, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/src/Agent.Worker/Handlers/StepHost.cs:line 85
   at Microsoft.VisualStudio.Services.Agent.Worker.Handlers.NodeHandler.RunAsync() in /mnt/vss/_work/1/s/src/Agent.Worker/Handlers/NodeHandler.cs:line 267
   at Microsoft.VisualStudio.Services.Agent.Worker.TaskRunner.RunAsyncInternal() in /mnt/vss/_work/1/s/src/Agent.Worker/TaskRunner.cs:line 447
   at Microsoft.VisualStudio.Services.Agent.Worker.TaskRunner.RunAsync() in /mnt/vss/_work/1/s/src/Agent.Worker/TaskRunner.cs:line 76
   at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken) in /mnt/vss/_work/1/s/src/Agent.Worker/StepsRunner.cs:line 264
2025-03-28T00:21:13.8543273Z ##[section]Finishing: Node.js info
ecl1ps

ecl1ps commented on Apr 9, 2025

@ecl1ps
Author

@tarunramsinghani Is there something we could try to narrow this down?

tarunramsinghani

tarunramsinghani commented on Apr 10, 2025

@tarunramsinghani
Contributor

@ecl1ps, can you please help with what is agent version you are using ? and the OS and configuration(ram, storage etc) of the agent machine ? As it seems like something on OS is stopping the node process from getting started. I dont believe the cause is that the file is missing as mentoined in the initial comment as agent ships with node20_1.

jiribaloncz

jiribaloncz commented on Apr 15, 2025

@jiribaloncz

Thanks for looking into this.

Allow me to answer your questions and also provide some additional details that might help with the investigation.

We’re using an Azure DevOps Agent Pool of type Azure Virtual Machine Scale Set.
Agent version is 4.252.0 (latest at the time of writing).

The VMSS behind the agent pool is configured as follows:
VM size: D2as_v4
RAM: 8 GB
vCPUs: 2
Local temp storage: 16 GB (SCSI)
OS disk: 128 GB Premium SSD (LRS), 500 IOPS
OS: Ubuntu 20.04 (sysprepped custom image from Azure Compute Gallery)

The issue appears intermittently. When I run a minimal YAML pipeline repeatedly on the same agent pool and image version, the problem does not reproduce. However, in actual production pipelines, the error does occur.

Given that the agent ships with node20_1, I also don’t believe the root cause is a missing file.
Could it be possible that something is modifying or misconfiguring the environment at runtime, causing the Node process to fail to start?

Any guidance or insight would be appreciated — let me know if further logs or debug output would help.

Thank you

tarunramsinghani

tarunramsinghani commented on Apr 16, 2025

@tarunramsinghani
Contributor

Hmmm the disk space seems okay in this case. Would it be possible to collect and share diagnostics logs when this reproes again. To enable and collection diagnostic logs for agent please check this - Configure verbose logs

ecl1ps

ecl1ps commented on Apr 17, 2025

@ecl1ps
Author

Here are logs of yesterday's failed run featuring StandardOut has not been redirected or the process hasn't started yet.
logs_1589538.zip

tarunramsinghani

tarunramsinghani commented on Apr 18, 2025

@tarunramsinghani
Contributor

I see that AgentDiagnistic logs are missing, can you please enable the below option to make sure it will collect agent diagnostic logs.

Image

Also seems like the Error "StandardOut has not been redirected or the process hasn't started yet." is masking the actual issue somewhere else. I still believe the original issue is the initial error you mentioned. Is it possible to repro this with 4.251.0 agent and with all diagnostic logs enabled and sysm.debug set to true as well please. that will be helpful in further debugging. Also can you please check if this reproes with any specific agent in the pool as I suspect there is some issue with FS which is causing the node command to fail. Maybe delete that specific VM if that is the case ?

ecl1ps

ecl1ps commented on Apr 30, 2025

@ecl1ps
Author

The logs I posted above are everything that was generated when I enabled the system diagnostic by using

Image

The Agent Diagnostics logs are just not there. I don't see a way to enable them other than system.debug. Or am I reading it wrong? https://learn.microsoft.com/en-us/azure/devops/pipelines/troubleshooting/review-logs?view=azure-devops&tabs=windows-agent#agent-diagnostic-logs

We are not using VMs but Scale Sets, I don't think the same applies to them.

k290

k290 commented on May 22, 2025

@k290

For what its worth OPs issue isn't isolated. We have had the same error intermittently on agent version 4.252. OS is Windows for us

Jezour1sw

Jezour1sw commented on May 27, 2025

@Jezour1sw

We're experiencing the same issue in our environment. We're also using Ubuntu-based agents running in a VM Scale Set. The error occurs consistently when the same Key Vault library is referenced in multiple jobs within the same pipeline.
Let me know if there's any recommended workaround or fix being worked on.

Jezour1sw

Jezour1sw commented on May 27, 2025

@Jezour1sw

Image

German-Alcaor

German-Alcaor commented on Jun 17, 2025

@German-Alcaor

Although the Microsoft documentation doesn't list Node.js as a requirement, I found that installing it (Node version 24.2.0) on my self-hosted agent resolved the issue for me. Sharing here in case it helps others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @ecl1ps@tarunramsinghani@k290@christhebatchelor@Jezour1sw

    Issue actions

      [BUG]: An error occurred trying to start process '/agent/externals/node20_1/bin/node' · Issue #5151 · microsoft/azure-pipelines-agent