Skip to content

SPMI: Timeout during collection of LoaderClassloaderGenerics on win-arm64 #116210

Open
@jakobbotsch

Description

@jakobbotsch

The win-arm64 collection of coreclr_tests with TieredCompilation sometimes times out. See this run:
https://dev.azure.com/dnceng/internal/_build/results?buildId=2721059&view=results

[18:43:52] Loaded 1000 at 11387 per second
[18:43:52] Loaded 2000 at 12543 per second
[18:43:52] Loaded 3000 at 15217 per second
[18:43:52] Loaded 4000 at 16025 per second
[18:43:52] Loaded 5000 at 13997 per second
[18:43:52] Loaded 6000 at 9908 per second
[18:43:52] Loaded 6697, Saved 3792
[18:43:52] Read/Wrote 113 MB @ 157.87 MB/s.
[18:43:52] Cleaning MCH file
[18:43:52] Invoking: C:\h\w\A2C508EB\p\superpmi.exe -p -f C:\h\w\A2C508EB\w\ACE4098A\u\spmi_collect\basefail.mcl C:\h\w\A2C508EB\w\ACE4098A\u\spmi_collect\base.mch C:\h\w\A2C508EB\p\clrjit.dll
[18:43:55] Using child (C:\h\w\A2C508EB\p\superpmi.exe) with args ( C:\h\w\A2C508EB\p\clrjit.dll C:\h\w\A2C508EB\w\ACE4098A\u\spmi_collect\base.mch)
[18:43:55]  failingMCList=C:\h\w\A2C508EB\w\ACE4098A\u\spmi_collect\basefail.mcl
[18:43:55]  workerCount=2, skipCleanup=0.
[18:43:55] Loaded 4189  Jitted 4189  FailedCompile 0 Excluded 0 Missing 0
[18:43:55] Total time: 3062.884600ms
[18:43:55] Verifying MCH file
[18:43:55] Using superpmi.exe from Core_Root: C:\h\w\A2C508EB\p\superpmi.exe
[18:43:55] 
[18:43:55] Temp Location: C:\h\w\A2C508EB\t\tmp97iq987l
[18:43:55] 
[18:43:55] Running SuperPMI replay of C:\h\w\A2C508EB\w\ACE4098A\uploads\coreclr_tests.run.windows.arm64.Checked.mch
[18:43:55] Invoking: C:\h\w\A2C508EB\p\superpmi.exe -v ewi -p -f C:\h\w\A2C508EB\t\tmp97iq987l\coreclr_tests.run.windows.arm64.Checked.mch_fail.mcl -details C:\h\w\A2C508EB\t\tmp97iq987l\coreclr_tests.run.windows.arm64.Checked.mch_details.csv C:\h\w\A2C508EB\p\clrjit.dll C:\h\w\A2C508EB\w\ACE4098A\uploads\coreclr_tests.run.windows.arm64.Checked.mch
['LoaderClassloaderGenerics' END OF WORK ITEM LOG: Command timed out, and was killed]

Note that in other runs it completes and quite quickly too, e.g. here is another log from the same work item:

[20:52:24] Loaded 1000 at 12251 per second
[20:52:24] Loaded 2000 at 12490 per second
[20:52:24] Loaded 3000 at 14980 per second
[20:52:24] Loaded 4000 at 15967 per second
[20:52:24] Loaded 5000 at 13652 per second
[20:52:24] Loaded 6000 at 9655 per second
[20:52:24] Loaded 6687, Saved 3797
[20:52:24] Read/Wrote 105 MB @ 146.72 MB/s.
[20:52:24] Cleaning MCH file
[20:52:24] Invoking: C:\h\w\B1640984\p\superpmi.exe -p -f C:\h\w\B1640984\w\A6550961\u\spmi_collect\basefail.mcl C:\h\w\B1640984\w\A6550961\u\spmi_collect\base.mch C:\h\w\B1640984\p\clrjit.dll
[20:52:27] Using child (C:\h\w\B1640984\p\superpmi.exe) with args ( C:\h\w\B1640984\p\clrjit.dll C:\h\w\B1640984\w\A6550961\u\spmi_collect\base.mch)
[20:52:27]  failingMCList=C:\h\w\B1640984\w\A6550961\u\spmi_collect\basefail.mcl
[20:52:27]  workerCount=2, skipCleanup=0.
[20:52:27] Loaded 4176  Jitted 4176  FailedCompile 0 Excluded 0 Missing 0
[20:52:27] Total time: 3144.440700ms
[20:52:27] Verifying MCH file
[20:52:27] Using superpmi.exe from Core_Root: C:\h\w\B1640984\p\superpmi.exe
[20:52:27] 
[20:52:27] Temp Location: C:\h\w\B1640984\t\tmp06_u5235
[20:52:27] 
[20:52:27] Running SuperPMI replay of C:\h\w\B1640984\w\A6550961\uploads\coreclr_tests.run.windows.arm64.Checked.mch
[20:52:27] Invoking: C:\h\w\B1640984\p\superpmi.exe -v ewi -p -f C:\h\w\B1640984\t\tmp06_u5235\coreclr_tests.run.windows.arm64.Checked.mch_fail.mcl -details C:\h\w\B1640984\t\tmp06_u5235\coreclr_tests.run.windows.arm64.Checked.mch_details.csv C:\h\w\B1640984\p\clrjit.dll C:\h\w\B1640984\w\A6550961\uploads\coreclr_tests.run.windows.arm64.Checked.mch
[20:52:31] Clean SuperPMI replay (4176 contexts processed)
[20:52:31] Replay summary:
[20:52:31]   All replays clean
[20:52:31] Process MCH files for CI
[20:52:31] Generated MCH file: C:\h\w\B1640984\w\A6550961\uploads\coreclr_tests.run.windows.arm64.Checked.mch
[20:52:31] Finish time: 20:52:31
[20:52:31] Elapsed time: 0:00:07.476571

We might be hitting some kind of rare infinite loop, so we should investigate.

Edit: Looks like this happens in other platforms as well, e.g. the same run has two win-x86 failures with similar timeouts.

cc @dotnet/jit-contrib

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions