Closed
Description
Currently memset is unrolled and optimized within the body of the loop instead of being hoisted, which is suboptimal when the unrolling is less than the iteration number: https://godbolt.org/z/dK3a7xjPW.
This also means -fno-unroll-loops causes the memsets to not be optimized at all.
Previously discussed in #143015:
The optimization happens as a result of unrolling, so it is affected by target-dependent heuristics. It would be legal to do it independently of unrolling by hoisting the memset out of the loop, it's just not implemented. It does work for a plain store, implemented here I believe:
It could be extended to the memset case.llvm-project/llvm/lib/Transforms/Scalar/LICM.cpp
Line 1269 in 470f456