Frequent GC Triggered by 'Internal Tuning' on Windows node compared to Linux/ARM node in .NET 8.0.16

### Description

We ([RavenDB](https://github.com/ravendb/ravendb) team) are investigating an issue related to GC where a RavenDB cluster node running Windows Server 2022 (`WIN`) experiences significantly more frequent GCs and higher GC pause times compared to its counterpart running on Ubuntu 24.04.2 (`ARM`), despite both nodes having the same workload and configuration. 

This is a replica cluster that receives replicated data from the main cluster, with no external requests. Both nodes have identical databases, configurations, and GC settings specified in the `Raven.Server.runtimeconfig.json` file:


```
{
  "runtimeOptions": {
    "tfm": "net8.0",
    "includedFrameworks": [
      {
        "name": "Microsoft.NETCore.App",
        "version": "8.0.16"
      },
      {
        "name": "Microsoft.AspNetCore.App",
        "version": "8.0.16"
      }
    ],
    "configProperties": {
      "System.GC.Concurrent": true,
      "System.GC.Server": true,
      "System.GC.RetainVM": true,
      "System.Reflection.Metadata.MetadataUpdater.IsSupported": false,
      "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization": false,
      "System.Runtime.TieredPGO": true
    }
  }
}
```

Both nodes have identical hardware configurations (2 cores, 8 GB memory). The difference lies in their operating systems and processor architecture:

- `WIN`: Windows Server 2022 on x64.
- `ARM`: Ubuntu 24.04.2 on ARM64.

The issue becomes noticeable after a restart of the cluster nodes (regular updates of RavenDB) but that isn't always the case, it doesn't reproduce always. Initially, GC behavior is similar for both nodes, but within a few minutes, the `WIN` node begins triggering GCs much more frequently. On analyzing the GC traces, we consistently observe the `WIN` node triggering GCs due to **"Internal Tuning"**, whereas the `ARM` node does not exhibit this behavior.


### Configuration

- 2 cores. 8 GB memory  

- `WIN`: Windows Server 2022 (x64)  
- `ARM`: Ubuntu 24.04.2 (ARM64)  

- .NET 8.0.16  

### Analysis

Initially, both nodes show similar GC activity. After a few minutes, however, the GC on `WIN` becomes significantly more frequent, leading to smaller heap sizes for Gen0 and Gen1, and subsequently, a much higher **PauseTimePercentage**:

![Image](https://github.com/user-attachments/assets/9f8acbc3-ccce-4aa1-9d19-8c37807c30a9)

  - GC traces collected using `dotnet-trace collect --profile gc-verbose --name Raven.Server --duration 00:05:00` revealed:  

    - **`WIN` node**: 163 GCs in 5 minutes  
      ![Image](https://github.com/user-attachments/assets/937969a0-8add-4d4d-82f4-9290949c8c90)  

    - **`ARM` node**: 47 GCs in 5 minutes  
      ![Image](https://github.com/user-attachments/assets/6009d7d0-1b39-47bc-8c20-aca0814daad1)  

    - The primary reason for GC on the `WIN` node is "Internal Tuning", while this is absent on the `ARM` node.  

---

17 hours later, the issue persists with the `WIN` node still showing higher GC activity:

![Image](https://github.com/user-attachments/assets/be9f94dd-554a-4bde-8162-70f8d7eb7fe8)

  - **Pause Time Percentage (PauseTimePercentage):**  
    - `WIN` node: 4.69%  
    - `ARM` node: Stable and much lower  

  - GC traces for `WIN` node - there are "Internal Tuning" reasons but there are also GC where the reason isn't specified:  
    ![Image](https://github.com/user-attachments/assets/99289796-06a4-4812-b44a-6f60f156bee7)  


### Regression?

This does not appear to be a regression, as we have observed similar behavior in the past.


---

We are seeking assistance in understanding:  

1. What does "Internal Tuning" mean in this context, and why might it disproportionately affect the `WIN` node?  
2. Are there GC or runtime optimizations that are platform/architecture-specific that could explain this behavior?  
3. Are there any additional steps we should take to investigate or mitigate this problem?  

Detailed insights on "Internal Tuning" triggers and how they differ between platforms would be greatly appreciated.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Frequent GC Triggered by 'Internal Tuning' on Windows node compared to Linux/ARM node in .NET 8.0.16 #115879

Description

Configuration

Analysis

Regression?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Frequent GC Triggered by 'Internal Tuning' on Windows node compared to Linux/ARM node in .NET 8.0.16 #115879

Description

Description

Configuration

Analysis

Regression?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions