-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Closed
Closed
Copy link
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIblocking-clean-ci-optionalBlocking optional rolling runsBlocking optional rolling runs
Milestone
Description
Pipeline run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1056248&view=results
Example console log: https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-47f137d3961544ab84/Microsoft.Extensions.Hosting.Unit.Tests/1/console.2a54c506.log?helixlogtype=result
set DOTNET_TieredCompilation=1
set DOTNET_TC_OnStackReplacement=1
set DOTNET_TC_QuickJitForLoops=1
set DOTNET_TC_OnStackReplacement_InitialCounter=1
set DOTNET_OSR_HitLimit=1
...
Starting: Microsoft.Extensions.Hosting.Unit.Tests (parallel test collections = on [4 threads], stop on fail = off)
Microsoft.Extensions.Hosting.Tests.LifecycleTests.CallbackOrder(concurrently: True) [FAIL]
System.InvalidOperationException : Unable to activate type 'Microsoft.Extensions.Logging.LoggerFactory'. The following constructors are ambiguous:
Void .ctor(System.Collections.Generic.IEnumerable`1[Microsoft.Extensions.Logging.ILoggerProvider], Microsoft.Extensions.Options.IOptionsMonitor`1[Microsoft.Extensions.Logging.LoggerFilterOptions], Microsoft.Extensions.Options.IOptions`1[Microsoft.Extensions.Logging.LoggerFactoryOptions], Microsoft.Extensions.Logging.IExternalScopeProvider)
Void .ctor(System.Collections.Generic.IEnumerable`1[Microsoft.Extensions.Logging.ILoggerProvider], Microsoft.Extensions.Logging.LoggerFilterOptions)
This fails on multiple platforms under OSR stress. cc @dotnet/jit-contrib
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIblocking-clean-ci-optionalBlocking optional rolling runsBlocking optional rolling runs
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
AndyAyersMS commentedon Jun 3, 2025
I isolated this to
via bisecting
DOTNET_JitEnablePatchpointRange
. With OSR suppressed for this hash all the test fail.From there it appears it is related somehow to a non-phi-based jump-thread we do; we have a redundant dominated compare
and if we allow this jump threading then the test fails.
I have not yet determined whether the issue is upstream and the VN/SSA is incorrect (though that seems unlikely) or the issue is in RBO (also not too likely as this is just a structural opt) or somewhere downstream. We are bypassing a global PHI in BB76 so perhaps that is a contributing factor.
Also not yet clear why this just started happening, though perhaps the recent changes in inlining are involved. There is no stack allocation or physical promotion in this method.
AndyAyersMS commentedon Jun 4, 2025
This is a bad interaction between RBO and Assertion Prop (or arguably just a bug in AP).
Before RBO we have this flow graph. Here colored edges are false outcomes. You can see V08.9 can be either null or non-null depending on how the flow evolves from BB51.
RBO jump threads through BB76 since it has the same pred as BB52, giving the following flow graph, and AP runs over this graph
Here BB76 is a now-unreachable pred of BB78 (red edge). For unreachable blocks AP makes all assertions available, that means BB76's out assertion set makes contradictory claims: V08.7 is null and also V08.7 is not null.
Also note there is no PHI arg for the new pred BB52.
AP then walks the PHI in BB78, looking to see if each pred asserts that V08 is non-null, and they all do, and so AP optimizes the branch in BB78.
As for a fix:
I am going to try looking for contradictions first.
AndyAyersMS commentedon Jun 4, 2025
FYI @EgorBo subtle issue in phi analysis during AP.
cc @dotnet/jit-contrib
JIT: fix issue in assertion prop phi inference
jakobbotsch commentedon Jun 5, 2025
Is it possible to fix this part instead, for a more future proof fix? For example by making AP only consider generated assertions available out of unreachable blocks that have been jump threaded.
The bug here reminds me of the bug that led to
BBF_NO_CSE_IN
. We should really consider if we can run RBO as one of the last phases to avoid these "the data is outdated" issues.AndyAyersMS commentedon Jun 6, 2025
Yes, the whole thing seems fragile.
I don't quite follow what you're suggesting. Seems like the problematic assertion set could come from some (also unreachable) descendant of a jump threaded block so we'd need to somehow track which unreachable blocks were tainted.
13 remaining items