Description
The ProbabilisticMap
(bloom filter) implementation we use in CoreLib to speed up IndexOfAny(char[])
operations has a vectorized path (Sse41.IsSupported || AdvSimd.Arm64.IsSupported
).
The process is split into two parts: 1) building the map, and 2) searching through text using said map.
The actual data structure we end up with depends on whether vectorization is supported. That is, we must make the same decision re: vectorization support when building the map and when we're searching.
If they don't agree (e.g. SetCharBit
assumes these intrinsics aren't supported, but IsCharBitSet
does) we will report wrong results.
This situation is called out in BOTR docs: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/vectors-and-intrinsics.md#code-review-and-analyzer-rules-for-code-written-in-systemprivatecorelibdll
Within a single function that uses platform intrinsics, unless marked with the
CompExactlyDependsOn
attribute it must behave identically regardless of whether IsSupported returns true or not. This allows the R2R compiler to compile with a lower set of intrinsics support, and yet expect that the behavior of the function will remain unchanged in the presence of tiered compilation.
The issue is that if we do apply CompExactlyDependsOn
to these methods, we'll run into a viral attribute explosion.
The code analyzer in corelib will tell us that ProbabilisticMap.Contains
should be marked, then all its callers like string.MakeSeparatorListAny
, then string.SplitInternal
, then all string.Split
overloads, then ...
What should we do in such situations?
Just mark the two root methods (SetCharBit
, IsCharBitSet
) as AggressiveOptimization
?