Skip to content

17.11+ System.IO.IOException: Connection forcibly closed by remote host #178

Open
@Sterlinghv

Description

@Sterlinghv

Issue Description

After upgrading Microsoft.NET.Test.Sdk from version 17.10.0 to 17.11.0, our AzureDevOps test pipeline task encounters a System.IO.IOException at the very beginning of the run.

Steps to Reproduce

Use Microsoft.NET.Test.Sdk version 17.11.0 or above
Run dotnet test via the DotNetCoreCLI@2 in an AzureDevOps pipeline:

  - task: DotNetCoreCLI@2
    displayName: dotnet test
    inputs:
      command: test
      projects: <project solution>

Expected Behavior

Test run as expected with no IO exception at the start

Actual Behavior

Error seen in the AzureDevops test pipeline log:

System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.BufferedStream.Flush()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.SocketCommunicationManager.WriteAndFlushToChannel(String rawMessage) in /_/src/Microsoft.TestPlatform.CommunicationUtilities/SocketCommunicationManager.cs:line 413
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.DataCollection.DataCollectionRequestSender.SendAfterTestRunEndAndGetResult(ITestMessageEventHandler runEventsHandler, Boolean isCancelled) in /_/src/Microsoft.TestPlatform.CommunicationUtilities/DataCollectionRequestSender.cs:line 155
   at Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.DataCollection.ProxyDataCollectionManager.<>c__DisplayClass20_0.<AfterTestRunEnd>b__0() in /_/src/Microsoft.TestPlatform.CrossPlatEngine/DataCollection/ProxyDataCollectionManager.cs:line 155
   at Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.DataCollection.ProxyDataCollectionManager.InvokeDataCollectionServiceAction(Action action, ITestMessageEventHandler runEventsHandler) in /_/src/Microsoft.TestPlatform.CrossPlatEngine/DataCollection/ProxyDataCollectionManager.cs:line 288

Environment

Windows Server 2016 inside a AzureDevOps agent pipeline job

Additional Notes

This is not happening on version 17.10.0 or presumably earlier. Seems to be happening at ProxyDataCollectionManager.cs line 288

I'm not sure what is trying to be written/connected when the issue happens

Activity

nohwnd

nohwnd commented on Oct 24, 2024

@nohwnd
Member

Hello, this error is a red herring, it means that the other process died unexpectedly. Please collect diagnostic logs and post them here:

https://github.com/Microsoft/vstest/blob/main/docs/diagnose.md

Or if they contain private information, please create a visual studio feedback issue and upload them there, you can share logs there privately.

If you are unable to share them, please look into the .host..log for exceptions. Please start from the bottom, it should be the last exception that happens. OR if you see just heartbeat and then the process dies, then it was killed externally.

srikcgaa2

srikcgaa2 commented on Jan 25, 2025

@srikcgaa2

@nohwnd I can confirm that this error appears after I upgraded the version from 17.9.0 to 17.11.1 when this error does not appear before the update with same set of test cases

nohwnd

nohwnd commented on Jan 27, 2025

@nohwnd
Member

Can you provide the diag logs I was asking for above, please?

srikcgaa2

srikcgaa2 commented on Feb 3, 2025

@srikcgaa2

Can you provide the diag logs I was asking for above, please?

Created Microsoft feedback ticket and attached logs in private

https://developercommunity.visualstudio.com/t/MicrosoftNETTestSDK-1711-SystemIO/10840657

srikcgaa2

srikcgaa2 commented on Feb 3, 2025

@srikcgaa2

Today I was also able to see that with 17.10.0, this exception did not occur on the same branch.

v-mykhalchuk

v-mykhalchuk commented on Mar 21, 2025

@v-mykhalchuk

Hello
We are seeing same issue when running 'dotnet test' in ADO pipeline job.

nohwnd

nohwnd commented on Jun 12, 2025

@nohwnd
Member

If this is still happening in 17.14 please report this again. Sorry for not finding the root cause.

v-mykhalchuk

v-mykhalchuk commented on Jun 12, 2025

@v-mykhalchuk

@nohwnd I can confirm this still occurs on version 17.14.1
Any details from the VM this runs on that we can examine to maybe get closer to a root cause? Maybe crash logs or diag output from dotnet test command?
Just initial thing - we run this as
dotnet.exe test MySolution.sln --logger trx --results-directory E:\testOut --no-build --blame --blame-hang --blame-hang-dump-type full --blame-hang-timeout 120000 --configuration release --collect "Code coverage" --settings test.runsettings

reopened this on Jun 12, 2025
nohwnd

nohwnd commented on Jun 12, 2025

@nohwnd
Member

Yup, diag logs would be great for a start. :)

v-mykhalchuk

v-mykhalchuk commented on Jun 13, 2025

@v-mykhalchuk

@nohwnd I did few tests today, diag produced HUGE result where I did not see anything obviously pointing to an issue, and I will not be able to share it without a good bit of sensitive data purification.
But I've got some additional observations.

  1. We are running it in ADO pipeline with code coverage collection: dotnet.exe test MySolution.sln --logger trx --results-directory E:\testOut --no-build --blame --blame-hang --blame-hang-dump-type full --blame-hang-timeout 120000 --configuration release --collect "Code coverage" --settings test.runsettings
  2. We have a pretty large solution - 135 projects.
  3. There are 45 xUnit unit test projects with about 8k unit tests total, varying from 2 to 800ish tests per project
  4. The issue seem to go away when we change coverage collector settings in runsettings file. See below.

Original runsettings file that fails:

<?xml version="1.0" encoding="utf-8"?>
<RunSettings>
  <!-- Configurations for data collectors -->
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="Code Coverage" uri="datacollector://Microsoft/CodeCoverage/2.0" assemblyQualifiedName="Microsoft.VisualStudio.Coverage.DynamicCoverageDataCollector, Microsoft.VisualStudio.TraceCollector, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
        <Configuration>
          <CodeCoverage>
            <ModulePaths>
              <!-- Only calculate coverage for CompanyName.ProductName.*.dll: -->
              <Include>
                <ModulePath>.*\\CompanyName\.ProductName\.[^\\]*\.dll</ModulePath>
              </Include>
            </ModulePaths>

            <!-- We recommend you do not change the following values: -->
            <UseVerifiableInstrumentation>True</UseVerifiableInstrumentation>
            <AllowLowIntegrityProcesses>True</AllowLowIntegrityProcesses>
            <CollectFromChildProcesses>True</CollectFromChildProcesses>
            <CollectAspDotNet>False</CollectAspDotNet>
          </CodeCoverage>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>
</RunSettings>

And when we update ModulePath to point to a single dll, the behavior may change:
<ModulePath>.*\\CompanyName.ProductName.ComponentName.dll</ModulePath>

If I point it to any code project dll - it fails the same way.
When I point it to a unit test project dll or to a not-existing one - it succeds.

Most of our unit test projects are decorated with ExcludeFromCodeCoverage attributes.

Hope this helps.

v-mykhalchuk

v-mykhalchuk commented on Jun 18, 2025

@v-mykhalchuk

hey @nohwnd
can you please confirm if above is helpful or if I should continue chasing the diag logs even though there seem to be nothing apparently pointing to an issue?

nohwnd

nohwnd commented on Jun 18, 2025

@nohwnd
Member

If you are unable to share the log, this is what I would do:

  1. find log for the testhost or testhosts that are crashing. typically in the main log (the one that does not have .host. or .datacollector. in name, you can search for "exitcode:", and see which testhost exited with non-zero exit code. There will be PID of the testhost, and you can find the respective log by that, and correlate the data collector log by that as well.

There probably will be empty entry for errorMessage, under this log, meaning that we did not capture any error output from the testhost because it crashed. (Sometimes you can find error there for example for missing runtime version, or similar).

In the testhost log I go to the last lines and check if there are some closing lines saying: terminating or similar. If there are the testhost terminated on purpose, if not and the log just ends in the middle of work, it crashed.

then I look into the datacollector log, to see if code coverage was enabled for this project, if yes, code coverage is often to blame for the crash due to access violation exception. If not I typically try to make a memory dump of the process to see how it crashed.

for that you would need --blame-crash , if this is on windows and net framework please also install procdump to your path, e.g. via chocolatey.

28 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @fhnaseer@nohwnd@srikcgaa2@Sterlinghv@v-mykhalchuk

      Issue actions

        17.11+ System.IO.IOException: Connection forcibly closed by remote host · Issue #178 · microsoft/codecoverage