Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Regressions from Profile Maintenance and Block Layout Changes #113108

Open
amanasifkhalid opened this issue Mar 4, 2025 · 9 comments
Open
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@amanasifkhalid
Copy link
Member

Part of #107749. I've collated all the regression reports assigned to me since #103450 was merged. For now, I'm excluding reports on Tiger machine configs to skip JCC errata-related regressions.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 4, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 4, 2025
@amanasifkhalid amanasifkhalid added tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 4, 2025
@amanasifkhalid amanasifkhalid self-assigned this Mar 4, 2025
@amanasifkhalid amanasifkhalid added this to the 10.0.0 milestone Mar 4, 2025
@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Mar 4, 2025

Notes Recent Score Orig Score arm64 ubuntu Surface Windows Viper Ubuntu Viper Windows Benchmark
#111736 2.65 2.67 1.76
1.79
3.99
3.98
System.Memory.Span(Byte).Reverse(Size: 512)
Unrelated 2.00 2.00 2.00
2.00
PerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj
#111736 1.97 1.95 1.43
1.42
2.72
2.68
System.Memory.Span(Char).Reverse(Size: 512)
#111736 1.84 1.85 1.68
1.68
2.02
2.04
System.Memory.Span(Int32).Reverse(Size: 512)
Unrelated 1.81 1.81 1.81
1.81
System.Linq.Tests.Perf_Enumerable.SingleWithPredicate_FirstElementMatches(input: Array)
Noisy 1.70 2.03 1.70
2.03
System.Text.Json.Document.Tests.Perf_DocumentParse.Parse(IsDataIndented: True, TestRandomAccess: True, TestCase: Json400KB)
#111736 1.56 1.06 1.56
1.06
System.Text.Json.Tests.Perf_Reader.ReadMultiSpanSequenceEmptyLoop(IsDataCompact: False, TestCase: DeepTree)
Unrelated 1.48 1.46 0.98
1.20
3.14
2.37
System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512)
Noisy 1.45 1.12 1.45
1.12
System.Text.Json.Node.Tests.Perf_ParseThenWrite.ParseThenWrite(IsDataIndented: True, TestCase: Json400KB)
#111915 1.43 1.47 1.43
1.47
System.Reflection.Invoke.StaticMethod5_arrayNotCached_int_string_struct_class_bool
#111736 1.39 1.39 1.39
1.39
System.Memory.Span(Byte).IndexOfValue(Size: 512)
#111736 1.38 1.39 1.38
1.39
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).BitwiseAnd_Scalar(BufferLength: 128)
#111736 1.35 1.31 1.35
1.31
System.Numerics.Tests.Perf_BigInteger.Equals(arguments: 259 bytes, Same)
#112041 1.33 1.33 1.33
1.33
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Pow_Vectors(BufferLength: 3079)
#112041 1.33 1.33 1.33
1.33
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Pow_ScalarExponent(BufferLength: 3079)
#112041 1.33 1.33 1.33
1.33
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Pow_ScalarBase(BufferLength: 3079)
#112041 1.33 1.32 1.33
1.32
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Pow_Vectors(BufferLength: 128)
#111736 1.33 1.33 1.33
1.33
Benchstone.BenchI.BubbleSort.Test
#112041 1.32 1.32 1.32
1.32
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Pow_ScalarExponent(BufferLength: 128)
#112041 1.32 1.32 1.32
1.32
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Pow_ScalarBase(BufferLength: 128)
#111736 1.32 1.33 1.32
1.33
System.Memory.Span(Int32).LastIndexOfValue(Size: 512)
1.32 1.26 1.32
1.26
System.Memory.Span(Int32).SequenceEqual(Size: 512)
1.31 1.30 1.31
1.30
SciMark2.kernel.benchmarkLU
#103450 1.30 1.29 1.30
1.29
System.Memory.Span(Int32).IndexOfAnyFourValues(Size: 512)
#111736 1.27 1.27 1.27
1.27
System.Memory.Span(Byte).StartsWith(Size: 512)
#112041 1.26 1.26 1.26
1.26
System.MathBenchmarks.Single.AtanPi
#111736 1.25 1.28 1.25
1.28
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Truncate(BufferLength: 128)
#111736 1.25 1.18 1.25
1.18
Microsoft.Extensions.Primitives.StringSegmentBenchmark.Equals_Object_Valid
#111736 1.25 1.24 1.41
1.42
1.11
1.09
System.Buffers.Text.Tests.Base64Tests.ConvertToBase64CharArray(NumberOfBytes: 1000)
Unrelated 1.25 1.60 1.25
1.60
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromResult
#110277 1.25 1.25 1.24
1.25
1.25
1.25
System.Collections.IterateForNonGeneric(String).ArrayList(Size: 512)
#111736 1.25 1.56 1.25
1.56
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).Max_Scalar(BufferLength: 128)
#111915 1.25 1.22 1.25
1.22
System.Collections.Tests.Add_Remove_SteadyState(String).Queue(Count: 512)
#111736 1.24 1.24 1.24
1.24
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).Add_Scalar(BufferLength: 128)
#103450 1.24 1.25 1.24
1.25
System.Memory.Span(Int32).IndexOfAnyFiveValues(Size: 512)
#111915 1.24 1.23 1.24
1.24
1.23
1.23
System.Collections.IterateForNonGeneric(Int32).ArrayList(Size: 512)
#111736 1.24 1.23 1.24
1.23
System.Memory.Span(Byte).SequenceEqual(Size: 512)
#111736 1.23 1.28 1.23
1.28
System.Memory.Span(Char).SequenceEqual(Size: 512)
#111047 1.23 1.22 1.23
1.22
System.Reflection.Invoke.Field_SetStatic_int
#112041 1.22 1.22 1.22
1.22
System.MathBenchmarks.Single.AcosPi
Regressed by 3-opt and profile changes 1.21 1.31 1.13
1.13
1.29
1.52
Benchstone.BenchI.IniArray.Test
#111915 1.20 1.20 1.20
1.20
System.Tests.Perf_Char.Char_ToUpperInvariant(input: "Hello World!")
#111915 1.19 1.20 1.18
1.16
1.21
1.25
Span.Sorting.QuickSortArray(Size: 512)
#112041 1.19 1.19 1.19
1.19
System.MathBenchmarks.Double.AtanPi
#103450 1.18 1.33 1.19
1.11
1.17
1.60
Span.Sorting.BubbleSortSpan(Size: 512)
#111915 1.18 1.15 1.27
1.21
1.09
1.09
System.Collections.IterateForEachNonGeneric(String).ArrayList(Size: 512)
Unrelated 1.17 1.17 1.17
1.17
System.Numerics.Tests.Perf_VectorConvert.Convert_float_int
#110277 1.17 1.19 1.17
1.19
System.Collections.IterateForEach(String).ImmutableSortedSet(Size: 512)
#110277 1.17 1.16 1.15
1.14
1.19
1.18
System.Collections.IterateFor(Int32).IList(Size: 512)
#112041 1.16 1.16 1.16
1.16
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).AtanPi(BufferLength: 128)
1.16 1.87 1.16
1.87
System.Buffers.Tests.RentReturnArrayPoolTests(Byte).ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: True)
#112041 1.16 1.16 1.16
1.16
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).AtanPi(BufferLength: 3079)
#110277 1.16 1.18 1.06
1.08
1.20
1.26
1.23
1.21
Span.Sorting.QuickSortSpan(Size: 512)
#111736 1.16 1.16 1.16
1.16
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).Divide_Vector(BufferLength: 128)
#111736 1.15 1.15 1.15
1.15
System.Text.Perf_Ascii.ToUpper_Bytes_Chars(Size: 128)
#111736 1.15 1.14 1.15
1.14
System.Tests.Perf_Int64.TryFormat(value: 12345)
1.15 1.14 1.15
1.14
System.Collections.CreateAddAndClear(String).ConcurrentQueue(Size: 512)
#111736 1.15 1.30 1.15
1.30
System.Memory.Span(Byte).IndexOfAnyThreeValues(Size: 33)
#111915 1.14 1.07 1.14
1.07
System.Collections.Tests.Perf_PriorityQueue(Int32, Int32).HeapSort(Size: 1000)
#111915 1.14 1.11 1.14
1.11
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 6, Options: NonBacktracking)
1.14 1.14 1.14
1.14
System.Text.Json.Document.Tests.Perf_EnumerateObject.EnumerateProperties(TestCase: StringProperties)
1.14 1.14 1.14
1.14
Span.IndexerBench.CoveredIndex2(length: 1024)
1.14 1.14 1.07
1.07
1.21
1.21
System.MathBenchmarks.Single.AsinPi
1.14 1.14 1.11
1.11
1.17
1.17
System.MathBenchmarks.Double.AsinPi
1.14 1.13 1.14
1.13
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).AtanPi(BufferLength: 128)
1.14 1.11 1.14
1.11
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 10, Options: NonBacktracking)
1.13 1.14 1.13
1.14
Benchstone.BenchF.Trap.Test
1.13 1.12 1.12
1.11
1.14
1.14
System.MathBenchmarks.Double.AcosPi
1.13 1.20 1.13
1.20
XmlDocumentTests.XmlNodeListTests.Perf_XmlNodeList.GetCount
1.12 1.12 1.12
1.12
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).AtanPi(BufferLength: 3079)
1.12 1.13 1.09
1.06
1.16
1.21
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_Vectors(BufferLength: 3079)
1.12 1.12 1.12
1.12
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).Distance(BufferLength: 3079)
1.11 1.11 1.11
1.11
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).Max(BufferLength: 3079)
1.11 1.11 1.11
1.11
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10000, ItemsPerBucket: 5)
1.11 1.10 1.11
1.10
Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty
1.11 1.20 1.05
1.06
1.17
1.36
Benchstone.BenchI.BubbleSort2.Test
1.11 1.11 1.11
1.11
System.Text.Perf_Ascii.ToLower_Bytes_Chars(Size: 128)
1.11 1.08 1.11
1.08
LinqBenchmarks.Count00LinqMethodX
1.11 1.09 1.11
1.09
PerfLabTests.DelegatePerf.MulticastDelegateInvoke(length: 1000)
1.10 1.09 1.10
1.09
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).FusedMultiplyAdd_ScalarMultiplier(BufferLength: 128)
1.10 1.52 1.06
1.71
1.15
1.35
System.Collections.IterateFor(String).ImmutableList(Size: 512)
1.10 1.10 1.10
1.10
System.Text.Json.Tests.Perf_Reader.ReadReturnBytes(IsDataCompact: False, TestCase: DeepTree)
1.10 1.20 1.10
1.20
System.Collections.TryGetValueTrue(SmallClass, SmallClass).ImmutableDictionary(Size: 512)
1.10 1.07 1.10
1.07
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)
1.10 1.14 1.01
1.13
1.15
1.16
1.14
1.14
SeekUnroll.Test(boxedIndex: 3)
1.10 1.07 1.10
1.07
Benchstone.MDBenchI.MDNDhrystone.Test
1.10 1.07 1.10
1.07
System.Tests.Perf_Version.Parse4
1.09 1.15 1.09
1.15
System.Tests.Perf_Enum.ToString_Format_NonFlags(value: Thursday, format: "f")
1.09 1.07 1.07
1.07
1.12
1.07
System.Tests.Perf_Enum.ToString_Flags(value: 32)
1.09 1.09 1.09
1.09
System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options: (, IgnoreCase, True))
1.09 1.09 1.09
1.09
System.Collections.IterateForEachNonGeneric(Int32).ArrayList(Size: 512)
1.09 1.10 1.09
1.10
Benchstone.BenchI.Array2.Test
1.09 1.05 1.09
1.05
System.IO.Tests.Perf_Path.GetDirectoryName
1.09 1.08 1.09
1.08
LinqBenchmarks.Where00ForX
1.09 1.19 1.20
1.20
0.99
1.17
Span.IndexerBench.CoveredIndex3(length: 1024)
1.09 1.11 1.09
1.11
System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options: (en-US, IgnoreCase, True))
1.09 1.09 1.09
1.09
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).BitwiseAnd_Vector(BufferLength: 128)
1.09 1.11 1.07
1.07
1.11
1.16
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_ScalarExponent(BufferLength: 3079)
1.09 1.09 1.09
1.09
System.Linq.Tests.Perf_Enumerable.SelectToList(input: IList)
1.09 1.07 1.09
1.07
System.Collections.IndexerSet(Int32).SortedDictionary(Size: 512)
1.09 1.11 1.06
1.07
1.11
1.16
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_ScalarBase(BufferLength: 128)
1.08 1.11 1.08
1.11
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Single).FusedMultiplyAdd_Vectors(BufferLength: 128)
1.08 1.09 1.08
1.09
LinqBenchmarks.Where00LinqQueryX
1.08 1.08 1.08
1.08
System.Memory.Span(Byte).LastIndexOfAnyValues(Size: 512)
1.08 1.09 1.08
1.09
LinqBenchmarks.Where00LinqMethodX
1.08 1.14 1.08
1.14
System.Buffers.Tests.ReadOnlySequenceTests(Byte).IterateGetPositionArray
1.08 1.13 1.08
1.13
System.Buffers.Tests.RentReturnArrayPoolTests(Byte).MultipleSerial(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: True)
1.08 1.12 1.08
1.12
System.Collections.Sort(IntStruct).List(Size: 512)
1.08 1.09 1.08
1.09
System.Collections.IterateForEach(Int32).FrozenDictionary(Size: 512)
1.08 1.08 1.08
1.08
Burgers.Test3
1.08 1.12 1.00
1.13
1.16
1.17
1.07
1.07
SeekUnroll.Test(boxedIndex: 1)
1.07 1.17 1.07
1.17
System.Threading.Tasks.ValueTaskPerfTest.CreateAndAwait_FromCompletedTask
1.07 1.18 1.07
1.18
System.Tests.Perf_Enum.ToString_Format_Flags_Large(value: All, format: "g")
1.07 1.08 1.07
1.08
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).AddMultiply_Vectors(BufferLength: 128)
1.07 1.09 1.07
1.09
System.Memory.Span(Int32).IndexOfAnyThreeValues(Size: 33)
1.07 1.09 1.07
1.09
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 3, Options: NonBacktracking)
1.07 1.10 1.08
1.11
1.07
1.09
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))
1.07 1.07 1.07
1.06
1.08
1.08
System.Text.Json.Document.Tests.Perf_ParseThenWrite.ParseThenWrite(IsDataIndented: True, TestCase: DeepTree)
1.07 1.08 1.07
1.08
System.Net.Tests.Perf_WebUtility.Decode_DecodingRequired
1.07 1.07 1.07
1.07
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_ScalarBase(BufferLength: 3079)
1.07 1.07 1.07
1.07
System.Buffers.Tests.SearchValuesByteTests.ContainsAny(Values: "abcdefABCDEF0123456789")
1.07 1.11 1.07
1.11
System.Collections.ContainsTrueComparer(String).HashSet(Size: 512)
1.07 1.08 1.07
1.08
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8(arguments: UnsafeRelaxed,no (escaping /) required,16)
1.07 1.06 1.07
1.06
System.Collections.TryGetValueTrue(BigStruct, BigStruct).ConcurrentDictionary(Size: 512)
1.07 1.07 1.07
1.07
Benchstone.BenchI.CSieve.Test
1.07 1.09 1.07
1.06
1.07
1.12
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_Vectors(BufferLength: 128)
1.07 1.07 1.07
1.07
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8(arguments: UnsafeRelaxed,no (escaping /) required,512)
1.07 1.09 1.07
1.10
1.07
1.09
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False))
1.07 1.11 1.07
1.11
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False))
1.07 1.06 1.07
1.06
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_ScalarExponent(BufferLength: 128)
1.07 1.07 1.07
1.07
System.Tests.Perf_Version.TryParse4
1.07 1.27 1.24
1.29
0.92
1.26
System.Tests.Perf_Char.Char_IsLower(input: "Good afternoon, Constable!")
1.07 1.11 1.07
1.11
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))
1.07 1.07 1.07
1.07
System.Collections.Tests.Perf_PriorityQueue(Int32, Int32).Dequeue_And_Enqueue(Size: 100)
1.06 1.06 1.06
1.06
System.Tests.Perf_Int128.TryParse(value: "-170141183460469231731687303715884105728")
1.06 1.06 1.06
1.06
System.Text.Perf_Ascii.Equals_Bytes_Chars(Size: 128)
1.06 1.06 1.06
1.06
System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Sqrt(BufferLength: 128)
1.06 1.06 1.07
1.06
1.06
1.06
System.Tests.Perf_Int128.ParseSpan(value: "-170141183460469231731687303715884105728")
1.06 1.08 1.06
1.08
System.Memory.Span(Byte).IndexOfAnyTwoValues(Size: 512)
1.06 1.06 1.05
1.06
1.07
1.07
System.Linq.Tests.Perf_Enumerable.Average(input: IEnumerable)
1.06 1.05 1.06
1.05
System.Reflection.Invoke.Property_Set_int
1.06 1.15 1.00
1.08
1.00
1.08
1.20
1.32
System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False))
1.06 1.15 1.06
1.15
System.Text.Json.Tests.Perf_Reader.ReadReturnBytes(IsDataCompact: True, TestCase: LotsOfStrings)
1.06 1.06 1.06
1.07
1.05
1.06
System.Tests.Perf_Int128.Parse(value: "-170141183460469231731687303715884105728")
1.06 1.06 1.06
1.06
System.MathBenchmarks.Single.ExpM1
1.06 1.06 1.06
1.06
1.05
1.06
System.Tests.Perf_Int128.TryParseSpan(value: "-170141183460469231731687303715884105728")
1.06 1.14 1.06
1.14
System.Tests.Perf_Int32.TryParse(value: "2147483647")
1.06 1.06 1.06
1.06
System.IO.Tests.Perf_StreamWriter.WritePartialCharArray(writeLength: 2)
1.06 1.06 1.06
1.06
System.Text.Json.Tests.Perf_Guids.WriteGuids(Formatted: False, SkipValidation: False)
1.06 1.13 1.06
1.09
1.05
1.17
MicroBenchmarks.Serializers.Xml_ToStream(IndexViewModel).XmlSerializer_
1.06 1.06 1.06
1.06
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Single).IndexOfMax(BufferLength: 128)
1.05 1.05 1.05
1.05
Layout.SearchLoops.LoopReturn
1.05 1.05 1.05
1.05
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).MaxMagnitude_Vector(BufferLength: 128)
1.05 1.05 1.05
1.05
System.MathBenchmarks.Double.ExpM1
1.05 1.10 1.05
1.10
System.Collections.CreateAddAndClear(String).ImmutableArray(Size: 512)
1.05 1.21 1.08
1.08
1.11
1.09
0.97
1.51
System.Memory.Span(Int32).IndexOfAnyFiveValues(Size: 33)
1.04 1.10 1.04
1.10
System.Collections.TryGetValueTrue(SmallClass, SmallClass).SortedList(Size: 512)
1.04 1.08 1.07
1.09
1.02
1.07
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 4, Options: NonBacktracking)
1.04 1.20 1.04
1.20
System.Collections.TryGetValueTrue(SmallClass, SmallClass).ImmutableSortedDictionary(Size: 512)
1.04 1.37 0.98
1.63
1.12
1.32
1.02
1.19
System.Collections.IterateFor(Int32).ImmutableList(Size: 512)
#111736 1.03 1.05 1.03
1.05
System.Memory.Span(Char).Clear(Size: 512)
1.03 1.08 1.03
1.08
System.Linq.Tests.Perf_Enumerable.AppendPrepend(input: IEnumerable)
1.03 1.80 1.03
1.80
System.Linq.Tests.Perf_Enumerable.SingleWithPredicate_LastElementMatches(input: Array)
1.03 1.07 1.03
1.07
System.Diagnostics.Perf_Process.GetCurrentProcess
1.03 1.09 1.03
1.09
System.Text.Json.Serialization.Tests.WriteJson(ClassRecord).SerializeToUtf8Bytes(Mode: Reflection)
1.03 1.11 1.03
1.11
System.Collections.IndexerSet(String).SortedList(Size: 512)
1.03 1.09 1.03
1.09
System.Collections.CreateAddAndRemove(String).List(Size: 512)
1.02 2.08 1.02
2.08
System.Threading.Tests.Perf_Interlocked.CompareExchange_int
1.02 1.10 1.02
1.10
System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives(Int32).AddMultiply_Vectors(BufferLength: 128)
1.02 1.09 1.02
1.09
Span.Sorting.BubbleSortArray(Size: 512)
1.02 1.22 1.02
1.22
System.Security.Cryptography.Primitives.Tests.Performance.Perf_FixedTimeEquals.FixedTimeEquals_256Bit_FirstBitDifferent
1.02 1.08 1.02
1.08
MicroBenchmarks.Serializers.Json_ToStream(Location).DataContractJsonSerializer_
1.02 1.20 1.02
1.20
System.Linq.Tests.Perf_Enumerable.SelectToList(input: Array)
1.02 1.17 1.02
1.17
System.Buffers.Tests.RentReturnArrayPoolTests(Object).ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False)
1.02 1.22 1.02
1.22
System.Security.Cryptography.Primitives.Tests.Performance.Perf_FixedTimeEquals.FixedTimeEquals_256Bit_Equal
1.02 1.22 1.02
1.22
System.Security.Cryptography.Primitives.Tests.Performance.Perf_FixedTimeEquals.FixedTimeEquals_256Bit_VersusZero
1.02 1.22 1.02
1.22
System.Security.Cryptography.Primitives.Tests.Performance.Perf_FixedTimeEquals.FixedTimeEquals_256Bit_LastBitDifferent
1.02 1.22 1.02
1.22
System.Security.Cryptography.Primitives.Tests.Performance.Perf_FixedTimeEquals.FixedTimeEquals_256Bit_AllBitsDifferent
1.02 1.22 1.02
1.22
System.Security.Cryptography.Primitives.Tests.Performance.Perf_FixedTimeEquals.FixedTimeEquals_256Bit_CascadingErrors
1.01 1.29 1.04
1.60
1.00
1.18
1.01
1.14
System.Collections.IterateFor(Int32).ImmutableSortedSet(Size: 512)
1.01 1.08 1.01
1.08
SciMark2.kernel.benchMonteCarlo
1.01 1.10 1.01
1.10
System.Collections.IterateForEach(String).SortedSet(Size: 512)
1.01 1.09 1.01
1.09
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10, ItemsPerBucket: 5)
1.01 1.13 1.00
1.19
1.02
1.07
PerfLabTests.LowLevelPerf.ForeachOverList100Elements
1.01 1.54 1.02
1.48
1.00
1.87
1.00
1.33
Struct.SpanWrapper.WrapperSum

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

Starting from the top, the System.Memory.Span.Reverse regressions are quite large, and were likely introduced by #111736. The only interesting change in that PR is we no longer mark blocks unreachable via normal flow as rarely run in optSetBlockWeights. As far as I can tell, this only makes a difference for handler regions.

Currently, profile synthesis will set the weight of handler region entries to something close to zero, but it won't mark them as rarely run. We probably don't want to change this, because marking them as rarely run would block inlining within handlers, which matters for finally regions. If we keep handler regions from being marked as rarely run after optSetBlockWeights, these blocks will be considered by a handful of later phases, in particular CSE. Considering CSEs in handler blocks is probably a poor use of our CSE budget, and we already treat handler regions as cold elsewhere, but #111736 had a decent number of improvements too, so I'm hesitant to reintroduce the previous behavior.

I wasn't able to repro the regression with EgorBot, but I'll take a look locally.

@amanasifkhalid
Copy link
Member Author

I wasn't able to repro the regression with EgorBot, but I'll take a look locally.

I'm able to repro the regression on my Windows AMD machine, but the difference is quite inconsistent. From run to run, the difference is either just a nanosecond, or over 2x. For example:

| Method  | Job        | Toolchain                                                                         | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Allocated | Alloc Ratio |
|-------- |----------- |---------------------------------------------------------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|------------:|
| Reverse | Job-IGPZVU | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe | 512  | 8.496 ns | 0.2086 ns | 0.1742 ns | 8.452 ns | 8.308 ns | 8.898 ns |  1.00 |    0.03 |         - |          NA |
| Reverse | Job-DKUPJI | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe  | 512  | 7.777 ns | 0.1223 ns | 0.1085 ns | 7.746 ns | 7.630 ns | 8.008 ns |  0.92 |    0.02 |         - |          NA |

| Method  | Job        | Toolchain                                                                         | Size | Mean      | Error     | StdDev    | Median    | Min       | Max       | Ratio | RatioSD | Allocated | Alloc Ratio |
|-------- |----------- |---------------------------------------------------------------------------------- |----- |----------:|----------:|----------:|----------:|----------:|----------:|------:|--------:|----------:|------------:|
| Reverse | Job-SIJWDS | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe | 512  | 21.037 ns | 0.2629 ns | 0.2459 ns | 20.927 ns | 20.858 ns | 21.567 ns |  1.00 |    0.02 |         - |          NA |
| Reverse | Job-MXGQIK | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe  | 512  |  8.090 ns | 0.1682 ns | 0.1405 ns |  8.051 ns |  7.973 ns |  8.485 ns |  0.38 |    0.01 |         - |          NA |

I'm failing to find any codegen diffs along the call stack of this benchmark, and the potential fix would likely have its own slew of regressions, so the return on pursuing these regressions likely aren't as high as others that are directly related to layout churn. I'll keep this one in mind for later...

@amanasifkhalid
Copy link
Member Author

PerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj regressed and then later improved due to flowgraph/profile churn. Its current regressed status is from #112060, and is tracked in #112657. This is also true for System.Linq.Tests.Perf_Enumerable.SingleWithPredicate_FirstElementMatches(input: Array). I'll remove these from the above chart.

@amanasifkhalid
Copy link
Member Author

System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512) improved on both platforms once late profile repair was enabled. On Viper Windows, it regressed again around 2/27/2025. I can't find an issue tracking this regression, but nothing in the commit range is flowgraph-related.

@amanasifkhalid
Copy link
Member Author

I've annotated the most-regressed benchmarks with culprit PRs based on when the last notable regression occurred.

  • JIT: Fix profile maintenance in optSetBlockWeights, funclet creation #111736 comes up much more often than I expected it to; I suspect it's pessimizing CSE quite a bit.
  • There are a handful of PRs that regressed from the initial 3-opt or 4-opt PRs, and didn't budge all that much since. These likely expose the limitations of our cost model's ability to reflect actual performance.
  • Unsurprisingly, several regressions are attributed to changes in profile data. These could be regressions in block layout as well, and/or regressions elsewhere.

I looked at a few of the regressions from #103450, and for these cases, 3-opt is keen to make bottom-tested loops top-tested. For example, here's the initial layout for System.SpanHelpers:IndexOfAny, which is the hot call in System.Memory.Span.IndexOfAnyFourValues and System.Memory.Span.IndexOfAnyFiveValues:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1      18 [000..011)-> BB03(0.2),BB02(0.8)     ( cond )                     i LIR IBC
BB02 [0032]  1       BB01                  0.80   14 [000..001)-> BB03(1)                 (always)                     i LIR IBC hascall gcsafe
BB03 [0033]  2       BB01,BB02             1      18 [000..022)-> BB05(0.2),BB04(0.8)     ( cond )                     i LIR IBC
BB04 [0037]  1       BB03                  0.80   14 [011..012)-> BB05(1)                 (always)                     i LIR IBC hascall gcsafe
BB05 [0038]  2       BB03,BB04             1      18 [011..025)-> BB06(0.99),BB20(0.01)   ( cond )                     i LIR IBC
BB06 [0002]  1       BB05                  0.99   18 [027..03C)-> BB20(0.00595),BB17(0.994)   ( cond )                     i LIR IBC
BB17 [0043]  2       BB06,BB19            37.70  679 [03C..04D)-> BB19(0.195),BB18(0.805) ( cond )                     i LIR IBC bwd
BB18 [0044]  2       BB10,BB17           152.87 2752 [04D..062)-> BB10(0.995),BB09(0.005) ( cond )                     i LIR IBC bwd
BB10 [0007]  1       BB18                152.11 2738 [064..06C)-> BB18(0.805),BB19(0.195) ( cond )                     i LIR IBC bwd
BB19 [0045]  2       BB10,BB17            36.93  665 [06C..074)-> BB17(0.994),BB20(0.00595)   ( cond )                     i LIR IBC bwd
BB09 [0006]  1       BB18                  0.76   14 [062..064)                           (return)                     i LIR IBC
BB20 [0046]  3       BB06,BB05,BB19        0.24    4 [025..114)                           (return)                     i LIR IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Both loops are bottom-tested, and we fall out of the inner loop into the outer loop. 3-opt transforms this into the following:

Initial layout cost: 3037.297884
Running greedy 3-opt pass.
Swapping partitions [BB17, BB10] and [BB19, BB19] (cost change = -106.408088)
Swapping partitions [BB19, BB10] and [BB20, BB09] (cost change = -0.105988)
Final layout cost: 2930.783808
...

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1      18 [000..011)-> BB03(0.2),BB02(0.8)     ( cond )                     i LIR IBC
BB02 [0032]  1       BB01                  0.80   14 [000..001)-> BB03(1)                 (always)                     i LIR IBC hascall gcsafe
BB03 [0033]  2       BB01,BB02             1      18 [000..022)-> BB05(0.2),BB04(0.8)     ( cond )                     i LIR IBC
BB04 [0037]  1       BB03                  0.80   14 [011..012)-> BB05(1)                 (always)                     i LIR IBC hascall gcsafe
BB05 [0038]  2       BB03,BB04             1      18 [011..025)-> BB06(0.99),BB20(0.01)   ( cond )                     i LIR IBC
BB06 [0002]  1       BB05                  0.99   18 [027..03C)-> BB20(0.00595),BB17(0.994)   ( cond )                     i LIR IBC
BB20 [0046]  3       BB06,BB05,BB19        0.24    4 [025..114)                           (return)                     i LIR IBC
BB09 [0006]  1       BB18                  0.76   14 [062..064)                           (return)                     i LIR IBC
BB19 [0045]  2       BB10,BB17            36.93  665 [06C..074)-> BB17(0.994),BB20(0.00595)   ( cond )                     i LIR IBC bwd
BB17 [0043]  2       BB06,BB19            37.70  679 [03C..04D)-> BB19(0.195),BB18(0.805) ( cond )                     i LIR IBC bwd
BB18 [0044]  2       BB10,BB17           152.87 2752 [04D..062)-> BB10(0.995),BB09(0.005) ( cond )                     i LIR IBC bwd
BB10 [0007]  1       BB18                152.11 2738 [064..06C)-> BB18(0.805),BB19(0.195) ( cond )                     i LIR IBC bwd
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

It's the same case for Span.Sorting.BubbleSortSpan:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        100 [000..00A)-> BB02(1)                 (always)                     i LIR IBC
BB02 [0001]  2       BB01,BB11           168.07   16807 [00A..010)-> BB11(0.00597),BB10(0.994)   ( cond )                     i LIR IBC bwd bwd-target
BB10 [0009]  2       BB02,BB05           27969. 2796903 [010..026)-> BB05(0.0455),BB04(0.954)  ( cond )                     i LIR IBC bwd
BB04 [0003]  1       BB10                26696. 2669566 [026..052)-> BB05(1)                 (always)                     i LIR IBC bwd
BB05 [0004]  2       BB04,BB10           27969. 2796903 [052..05A)-> BB10(0.994),BB11(0.00597)   ( cond )                     i LIR IBC bwd
BB11 [0010]  2       BB02,BB05           168.07   16807 [05A..061)-> BB02(0.994),BB08(0.00595)   ( cond )                     i LIR IBC bwd
BB08 [0007]  1       BB11                  1.00     100 [061..062)                           (return)                     i LIR IBC
BB12 [0011]  0                             0            [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Initial layout cost: 2924440.995680
Running greedy 3-opt pass.
Swapping partitions [BB10, BB04] and [BB05, BB05] (cost change = -77218.213563)
Swapping partitions [BB02, BB04] and [BB11, BB11] (cost change = -16506.722689)
Final layout cost: 2830716.059428

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        100 [000..00A)-> BB02(1)                 (always)                     i LIR IBC
BB11 [0010]  2       BB02,BB05           168.07   16807 [05A..061)-> BB02(0.994),BB08(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  2       BB01,BB11           168.07   16807 [00A..010)-> BB11(0.00597),BB10(0.994)   ( cond )                     i LIR IBC bwd bwd-target
BB05 [0004]  2       BB04,BB10           27969. 2796903 [052..05A)-> BB10(0.994),BB11(0.00597)   ( cond )                     i LIR IBC bwd
BB10 [0009]  2       BB02,BB05           27969. 2796903 [010..026)-> BB05(0.0455),BB04(0.954)  ( cond )                     i LIR IBC bwd
BB04 [0003]  1       BB10                26696. 2669566 [026..052)-> BB05(1)                 (always)                     i LIR IBC bwd
BB08 [0007]  1       BB11                  1.00     100 [061..062)                           (return)                     i LIR IBC
BB12 [0011]  0                             0            [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

For hot loops, 3-opt finds it massively profitable to create fallthrough along a loop backedge, since the entry edges are considerably lower in weight, but this gives a worse loop shape overall. I think we'd be justified in inhibiting the creation of fallthrough along loop backedges if it would make the loop top-tested. A targeted implementation would be to detect loop backedges in ThreeOptLayout::ConsiderEdge, and if the edge is a conditional jump, we don't consider it. I'd prefer ConsiderEdge continue to only check layout invariants and not introduce its own profitability checks, so we could also try tweaking the cost model to give loop backedges a cost of zero such that we don't try to create fallthrough along them, but I don't know all of the ramifications of that just yet.

@amanasifkhalid
Copy link
Member Author

By restricting 3-opt from creating fallthrough along loop backedges, and restricting hot jump compaction from considering loop exit edges (#113101), layout is looking better. For Span.Sorting.BubbleSortSpan:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1      18 [000..011)-> BB03(0.2),BB02(0.8)     ( cond )                     i LIR IBC
BB02 [0032]  1       BB01                  0.80   14 [000..001)-> BB03(1)                 (always)                     i LIR IBC hascall gcsafe
BB03 [0033]  2       BB01,BB02             1      18 [000..022)-> BB05(0.2),BB04(0.8)     ( cond )                     i LIR IBC
BB04 [0037]  1       BB03                  0.80   14 [011..012)-> BB05(1)                 (always)                     i LIR IBC hascall gcsafe
BB05 [0038]  2       BB03,BB04             1      18 [011..025)-> BB06(0.99),BB20(0.01)   ( cond )                     i LIR IBC
BB06 [0002]  1       BB05                  0.99   18 [027..03C)-> BB20(0.00595),BB17(0.994)   ( cond )                     i LIR IBC
BB17 [0043]  2       BB06,BB19            37.95  683 [03C..04D)-> BB19(0.196),BB18(0.804) ( cond )                     i LIR IBC bwd
BB18 [0044]  2       BB10,BB17           152.57 2746 [04D..062)-> BB10(0.995),BB09(0.005) ( cond )                     i LIR IBC bwd
BB10 [0007]  1       BB18                151.81 2733 [064..06C)-> BB18(0.804),BB19(0.196) ( cond )                     i LIR IBC bwd
BB19 [0045]  2       BB10,BB17            37.19  669 [06C..074)-> BB17(0.994),BB20(0.00595)   ( cond )                     i LIR IBC bwd
BB20 [0046]  3       BB06,BB05,BB19        0.24    4 [025..114)                           (return)                     i LIR IBC
BB09 [0006]  1       BB18                  0.76   14 [062..064)                           (return)                     i LIR IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I'm sure these two changes will incur pretty big diffs, but I think they're focused enough that (hopefully) the effects will be predictable. Checking if an edge is a loop exit/backedge requires iterating up to every loop edge in the method, so we don't want to do this lookup every time in ConsiderEdge. We should have space to cache this information in FlowEdge without inflating its size.

@amanasifkhalid
Copy link
Member Author

With #113101, some of the 4-opt regressions have layouts that are further away from a locally-minimal cost. For example, without the change, this is what the layout of Span.Sorting.QuickSortSpan looks like:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight        IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0016]  1                             1     2667265 [???..???)-> BB07(1)                 (always)                     i LIR IBC internal
BB05 [0003]  1       BB07                  6.51 17354584 [028..02C)-> BB06(1)                 (always)                     i LIR IBC bwd bwd-target
BB06 [0004]  2       BB05,BB12             8.64 23046467 [02C..030)-> BB08(0.0478),BB07(0.952)  ( cond )                     i LIR IBC bwd bwd-target
BB07 [0005]  2       BB06,BB01             9.20 24549367 [030..03D)-> BB05(0.707),BB08(0.293) ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB08 [0006]  3       BB06,BB07,BB09       10.38 27679929 [03D..047)-> BB14(0.0918),BB10(0.908)  ( cond )                     i LIR IBC bwd
BB10 [0009]  1       BB08                  9.43 25140181 [047..054)-> BB09(0.774),BB12(0.226) ( cond )                     i LIR IBC bwd bwd-src
BB09 [0007]  1       BB10                  7.29 19448297 [03F..043)-> BB08(1)                 (always)                     i LIR IBC bwd bwd-target
BB12 [0011]  1       BB10                  2.13  5691884 [058..080)-> BB06(1)                 (always)                     i LIR IBC bwd
BB14 [0013]  1       BB08                  0.95  2539748 [084..088)-> BB16(0.212),BB15(0.788) ( cond )                     i LIR IBC
BB15 [0014]  1       BB14                  0.75  2002351 [088..0A9)-> BB16(1)                 (always)                     i LIR IBC
BB16 [0015]  2       BB14,BB15             0.95  2539748 [0A9..0AA)-> BB17(1),BB20(0)         ( cond )                     i LIR IBC
BB17 [0019]  1       BB16                  0.95  2539748 [0A9..0AA)-> BB19(0.2),BB18(0.8)     ( cond )                     i LIR IBC
BB18 [0024]  1       BB17                  0.76  2031798 [0A9..0AA)-> BB19(1)                 (always)                     i LIR IBC hascall gcsafe
BB19 [0025]  2       BB17,BB18             0.95  2539748 [0A9..0B8)-> BB21(1),BB20(0)         ( cond )                     i LIR IBC hascall gcsafe
BB21 [0029]  1       BB19                  0.95  2539748 [0B7..0B8)-> BB23(0.2),BB22(0.8)     ( cond )                     i LIR IBC
BB22 [0034]  1       BB21                  0.76  2031798 [0B7..0B8)-> BB23(1)                 (always)                     i LIR IBC hascall gcsafe
BB23 [0035]  2       BB21,BB22             0.95  2539748 [00A..0C7)                           (return)                     i LIR IBC hascall gcsafe
BB20 [0028]  2       BB16,BB19             0           0 [0B7..0B8)                           (throw )                     i LIR IBC rare hascall gcsafe
BB24 [0037]  0                             0           0 [???..???)                           (throw )                     i LIR IBC rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

With it, instead of BB07 falling into the loop dominated by BB08, BB06 falls into it, despite the latter's edge into BB08 being colder:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight        IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0016]  1                             1     2696193 [???..???)-> BB07(1)                 (always)                     i LIR IBC internal
BB07 [0005]  2       BB06,BB01             9.76 26319504 [030..03D)-> BB05(0.707),BB08(0.293) ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0003]  1       BB07                  6.90 18595325 [028..02C)-> BB06(1)                 (always)                     i LIR IBC bwd bwd-target
BB06 [0004]  2       BB05,BB12             9.20 24799553 [02C..030)-> BB08(0.0443),BB07(0.956)  ( cond )                     i LIR IBC bwd bwd-target
BB09 [0007]  1       BB10                  7.68 20706461 [03F..043)-> BB08(1)                 (always)                     i LIR IBC bwd bwd-target
BB08 [0006]  3       BB06,BB07,BB09       10.92 29447769 [03D..047)-> BB14(0.0862),BB10(0.914)  ( cond )                     i LIR IBC bwd
BB10 [0009]  1       BB08                  9.98 26910689 [047..054)-> BB09(0.769),BB12(0.231) ( cond )                     i LIR IBC bwd bwd-src
BB12 [0011]  1       BB10                  2.30  6204229 [058..080)-> BB06(1)                 (always)                     i LIR IBC bwd
BB14 [0013]  1       BB08                  0.94  2537079 [084..088)-> BB16(0.231),BB15(0.769) ( cond )                     i LIR IBC
BB15 [0014]  1       BB14                  0.72  1949882 [088..0A9)-> BB16(1)                 (always)                     i LIR IBC
BB16 [0015]  2       BB14,BB15             0.94  2537079 [0A9..0AA)-> BB17(1),BB20(0)         ( cond )                     i LIR IBC
BB17 [0019]  1       BB16                  0.94  2537079 [0A9..0AA)-> BB19(0.2),BB18(0.8)     ( cond )                     i LIR IBC
BB18 [0024]  1       BB17                  0.75  2029663 [0A9..0AA)-> BB19(1)                 (always)                     i LIR IBC hascall gcsafe
BB19 [0025]  2       BB17,BB18             0.94  2537079 [0A9..0B8)-> BB21(1),BB20(0)         ( cond )                     i LIR IBC hascall gcsafe
BB21 [0029]  1       BB19                  0.94  2537079 [0B7..0B8)-> BB23(0.2),BB22(0.8)     ( cond )                     i LIR IBC
BB22 [0034]  1       BB21                  0.75  2029663 [0B7..0B8)-> BB23(1)                 (always)                     i LIR IBC hascall gcsafe
BB23 [0035]  2       BB21,BB22             0.94  2537079 [00A..0C7)                           (return)                     i LIR IBC hascall gcsafe
BB20 [0028]  2       BB16,BB19             0           0 [0B7..0B8)                           (throw )                     i LIR IBC rare hascall gcsafe
BB24 [0037]  0                             0           0 [???..???)                           (throw )                     i LIR IBC rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I probably shouldn't be trying to artificially inhibit 3-opt's cost model for such cases, like I am in #113101. For other 4-opt regressions, we have loops with unconditional back edges, and there doesn't seem to be some obviously optimal rotation of the loop. For example, from System.Collections.IterateForNonGeneric<String>.ArrayList:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       7736 [000..00D)-> BB15(0.01),BB27(0.99)   ( cond )                     i LIR IBC
BB27 [0029]  1       BB01                  0.99    7659 [???..???)-> BB15(0.01),BB14(0.99)   ( cond )                     LIR IBC internal
BB14 [0016]  2       BB06,BB27           496.69 3842423 [019..022)-> BB02(0.998),BB30(0.00197)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB14                495.71 3834841 [00D..00E)-> BB31(0),BB04(1)         ( cond )                     i LIR IBC nullcheck bwd bwd-target
BB04 [0011]  1       BB02                495.71 3834841 [00D..00E)-> BB06(1),BB31(0)         ( cond )                     i LIR IBC bwd
BB06 [0013]  1       BB04                495.71 3834841 [00D..019)-> BB14(1)                 (always)                     i LIR IBC idxlen nullcheck bwd
BB15 [0017]  2       BB01,BB27             0.02     154 [???..???)-> BB16(1)                 (always)                     LIR IBC internal
BB16 [0018]  2       BB15,BB25            10.08   78017 [019..019)-> BB17(0),BB18(1)         ( cond )                     i LIR IBC bwd bwd-src
BB18 [0020]  1       BB16                 10.08   78017 [???..???)-> BB19(1)                 (always)                     i LIR IBC internal bwd
BB19 [0021]  2       BB17,BB18            10.08   78017 [019..022)-> BB20(0.998),BB30(0.00197)   ( cond )                     i LIR IBC internal bwd bwd-src
BB20 [0022]  1       BB19                 10.06   77863 [00D..00D)-> BB21(0),BB22(1)         ( cond )                     i LIR IBC bwd bwd-target
BB22 [0024]  1       BB20                 10.06   77863 [00D..00E)-> BB31(0),BB23(1)         ( cond )                     i LIR IBC nullcheck bwd
BB23 [0025]  1       BB22                 10.06   77863 [00D..00E)-> BB24(1),BB31(0)         ( cond )                     i LIR IBC bwd
BB24 [0026]  1       BB23                 10.06   77863 [00D..00E)-> BB25(1)                 (always)                     i LIR IBC idxlen nullcheck bwd
BB25 [0027]  2       BB21,BB24            10.06   77863 [00D..019)-> BB16(1)                 (always)                     i LIR IBC internal bwd
BB30 [0032]  2       BB14,BB19             1.00    7736 [022..024)                           (return)                     i LIR IBC
BB17 [0019]  1       BB16                  0          0 [???..???)-> BB19(1)                 (always)                     i LIR IBC rare internal hascall gcsafe bwd
BB21 [0023]  1       BB20                  0          0 [???..???)-> BB25(1)                 (always)                     i LIR IBC rare internal hascall gcsafe bwd
BB31 [0033]  4       BB02,BB04,BB22,BB23   0          0 [00D..00E)                           (throw )                     i LIR IBC rare gcsafe newobj bwd
BB32 [0034]  0                             0          0 [???..???)                           (throw )                     i LIR IBC rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

In this case, both loop heads BB14 and BB16 are reached by unconditional loop back edges. It might be worthwhile reviving #109346, though since the diffs from that were hard to triage, it may be easier to reintroduce the unconditional-to-conditional branch opts we used to run with the old layout. They sort of duplicate loop inversion, but because they run late in the frontend, diffs should be easier to parse.

amanasifkhalid added a commit that referenced this issue Mar 7, 2025
Part of #113108. When converting an unconditional jump to a test block into a duplicate test block, fgOptimizeBranch flips the condition (unnecessarily -- I plan to fix this someday), which means we need to derive the new true edge's likelihood from the cloned false edge, and vice versa. Now that late profile repair is enabled, if we get this wrong, the flipped edge likelihoods will flatten loop weights, and inflate non-loop weights. I suspect fixing this will also revert some of the regressions from late profile repair.
amanasifkhalid added a commit that referenced this issue Mar 8, 2025
Part of #107749. This is part of my broader goal of phasing out fgReorderBlocks entirely, but there are a few transformations we may want to do right before block layout. In particular, I noticed in #113108 (comment) that some regressed benchmarks have loops with unconditional backedges; in the absence of better loop inversion, we might want to run our unconditional-to-conditional branch duplication optimization right before layout to improve branch behavior in loops. We can put such optimizations in this pre-layout phase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

1 participant