New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

opt: add ReplaceMaskedMemOps pass #2836

Merged

nurmukhametov merged 1 commit into ispc:main from nurmukhametov:fix-2611-v3

May 16, 2024

Collaborator

nurmukhametov commented Apr 16, 2024 •

edited

It traverses bitcode for masked stores that have the turned-off second part and the turned-on first part. We can safely replace them with narrow unmasked stores and loads with the following shuffle with the passthrough value. This can help the back-end to generate better code (no extra spills, assigning narrow registers).

This fixes that last part of issue #2611

nurmukhametov force-pushed the fix-2611-v3 branch 2 times, most recently from 7ab5b84 to 01b43d5 Compare

April 16, 2024 15:27

nurmukhametov requested review from turinevgeny, aneshlya and dbabokin

April 16, 2024 17:28

nurmukhametov force-pushed the fix-2611-v3 branch from 01b43d5 to 6b3a053 Compare

April 17, 2024 11:45

Collaborator Author

nurmukhametov commented Apr 17, 2024

The test from #2719 was added.

turinevgeny reviewed

View reviewed changes

Collaborator

turinevgeny left a comment

Would it makes sense to add unit LIT-tests? E.g. for given input IR check the new pass output.

Collaborator Author

nurmukhametov commented Apr 18, 2024

Would it makes sense to add unit LIT-tests? E.g. for given input IR check the new pass output.

It would but at the moment I don't see such tests for other passes from src/opt as well as ability to run specific pass only on input IR.

dbabokin reviewed

View reviewed changes

Collaborator

dbabokin left a comment

Does this PR supersedes #2809?

src/opt.cpp Outdated

@@ @@ -731,6 +731,18 @@ void ispc::Optimize(llvm::Module *module, int optLevel) { @@
                       optPM.addFunctionPass(PeepholePass());
                       optPM.addFunctionPass(llvm::ADCEPass());
+                      optPM.addFunctionPass(ReplaceHalfMaskedMemOpsPass());

Collaborator

dbabokin Apr 19, 2024

Is it possible to move it upper to avoid extra cleanup passes?

Collaborator

turinevgeny Apr 20, 2024

I agree, it should be an earlier pass, to open more optimization doors.

Collaborator Author

nurmukhametov May 9, 2024

I moved it earlier.

src/opt/ReplaceHalfMaskedMemOps.h Outdated

+              // The masked.load and masked.store intrinsics are directly mapped to machine
+              // instructions with the specified full width of vector values being loaded or
+              // stored. This transformation allows the backend to generate shorter vector
+              // loads and stores avoidind extra spills.

Collaborator

dbabokin Apr 19, 2024

Suggested change

      
            // loads and stores avoidind extra spills.
          
            // loads and stores avoiding extra spills.

Collaborator Author

nurmukhametov May 8, 2024

Done

src/opt/ReplaceHalfMaskedMemOps.h Outdated

+              // The masked.load and masked.store intrinsics are directly mapped to machine
+              // instructions with the specified full width of vector values being loaded or
+              // stored. This transformation allows the backend to generate shorter vector
+              // loads and stores avoidind extra spills.

Collaborator

dbabokin Apr 20, 2024

Is it only about shorter memory ops or it's also about short math operations? I assumed that it's both.

Also, what do you mean by "spill" here? It's reads/writes of the user visible memory, while spills refer to storing/restoring from temporary memory location when register allocator runs out of registers.

Collaborator Author

nurmukhametov May 8, 2024

Changed to "This transformation allows the backend to generate shorter vector memory operations and corresponding math operations avoiding extra spills of temporal values to memory"

turinevgeny reviewed

View reviewed changes

src/opt.cpp Outdated

@@ @@ -731,6 +731,18 @@ void ispc::Optimize(llvm::Module *module, int optLevel) { @@
                       optPM.addFunctionPass(PeepholePass());
                       optPM.addFunctionPass(llvm::ADCEPass());
+                      optPM.addFunctionPass(ReplaceHalfMaskedMemOpsPass());

Collaborator

turinevgeny Apr 20, 2024

I agree, it should be an earlier pass, to open more optimization doors.

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated

+                      auto N = CV->getType()->getNumElements();
+                      for (auto i = N / 2; i < N; i++) {
+                          llvm::Constant *E = CV->getAggregateElement(i);
+                          if (!E || !llvm::isa<llvm::ConstantInt>(E) || !llvm::cast<llvm::ConstantInt>(E)->isZero()) {

Collaborator

turinevgeny Apr 20, 2024

Is it possible to use llvm::all_of in this function?

Collaborator Author

nurmukhametov May 9, 2024

I am not quite sure how to use it for aggregates.

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated

+              llvm::Value *lBitcastPointerType(llvm::IRBuilder<> &B, llvm::Value *ptr, llvm::Value *value) {
+                  auto *vecType = llvm::cast<llvm::VectorType>(value->getType());
+                  auto *newPtrType = llvm::PointerType::get(vecType, 0 /* TODO! */);
+                  // TODO! opaque pointer is no-op here, any special handling?

Collaborator

turinevgeny Apr 20, 2024

Does it make sense to check if the pointer is opaque and skip the bitcast in this case?

Collaborator Author

nurmukhametov May 9, 2024

As I understand, no bitcast is generated in such case. See tests/lit-tests/2611.ll

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated

+                  return B.CreateBitCast(ptr, newPtrType);
+              }
+              llvm::Constant *lShrinkConstVec(llvm::LLVMContext &context, llvm::Value *originalValue) {

Collaborator

turinevgeny Apr 20, 2024

It would clearer if the method accepted llvm::Constant* argument.

Collaborator Author

nurmukhametov May 9, 2024

I agree, fixed

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated

+                  for (auto CI : loadsToReplace) {
+                      lReplaceMaskedLoad(builder, CI);
+                  }

Collaborator

turinevgeny Apr 20, 2024

Should llvm::PreservedAnalyses::all() be returned in case both storesToReplace and loadsToReplace are empty?

Collaborator Author

nurmukhametov May 9, 2024

Done

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated

+              // This function replaces, e.g.,
+              //
+              // %ptr = bitcast %v8_uniform_FVector4f* %Result.i to <8 x float>*

Collaborator

turinevgeny Apr 20, 2024

What would be code generation if the original N = 4 and we cut it to load <2 x float>? We need to ensure it wouldn't be worse than just masked.load/store.v4f32.

Collaborator Author

nurmukhametov May 9, 2024

Added tests/lit-tests/2611-2.ispc for this case.

aneshlya reviewed

View reviewed changes

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated Show resolved Hide resolved

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated Show resolved Hide resolved

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated Show resolved Hide resolved

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated

+              llvm::Value *lMergeVectors(llvm::IRBuilder<> &B, llvm::Value *firstVector, llvm::Value *secondVector,
+                                         llvm::Twine &name) {
+                  auto *firstVecType = llvm::cast<llvm::VectorType>(firstVector->getType());

Collaborator

aneshlya Apr 22, 2024

Why do you use llvm::cast and not llvm::dyn_cast?

auto *firstVecType = llvm::dyn_cast<llvm::VectorType>(firstVector->getType());
auto *secondVecType = llvm::dyn_cast<llvm::VectorType>(secondVector->getType());

Collaborator Author

nurmukhametov May 9, 2024

Done

Collaborator

aneshlya May 13, 2024

I still see mix of llvm::dyn_cast and llvm::cast in the code. It's not a problem if you're certain of the type you are casting to, and the program logic guarantees that the type is correct. If it's not, I suggest using llvm::dyn_cast.

Collaborator Author

nurmukhametov May 14, 2024

I have changed all llvm::cast to llvm::dyn_cast

src/opt/ReplaceHalfMaskedMemOps.cpp Outdated Show resolved Hide resolved

nurmukhametov mentioned this pull request

Merged

nurmukhametov force-pushed the fix-2611-v3 branch 3 times, most recently from 2ba1c81 to 053a80a Compare

May 9, 2024 15:37

Collaborator Author

nurmukhametov commented May 9, 2024

Does this PR supersedes #2809?

Yes, it is.

Collaborator Author

nurmukhametov commented May 9, 2024

Would it makes sense to add unit LIT-tests? E.g. for given input IR check the new pass output.

This addressed by #2845 and tests/lit-tests/2611.ll.

nurmukhametov force-pushed the fix-2611-v3 branch from 053a80a to d44bcc0 Compare

May 9, 2024 18:10

nurmukhametov changed the title ~~WIP: opt: add ReplaceHalfMaskedMemOps pass~~ opt: add ReplaceMaskedMemOps pass

nurmukhametov requested review from aneshlya, turinevgeny and dbabokin

May 9, 2024 18:17

nurmukhametov force-pushed the fix-2611-v3 branch 2 times, most recently from 60739be to 66213c9 Compare

May 10, 2024 15:01

Collaborator

aneshlya commented May 13, 2024

Please format lit-tests with clang-format.


          opt: add ReplaceHalfMaskedMemOps pass

166a633

It traverse bitcode for masked stores that have the turned-off part
half and the turned-on first part. We can safely replace them with
narrow unmasked stores and loads with the following shuffle with the
passthrough value. This can help the back-end to generate better code
(no extra spills, assigning narrow registers).

nurmukhametov force-pushed the fix-2611-v3 branch from 66213c9 to 166a633 Compare

May 14, 2024 12:14

Collaborator Author

nurmukhametov commented May 14, 2024

Please format lit-tests with clang-format.

Done

turinevgeny approved these changes

View reviewed changes

Collaborator

turinevgeny left a comment

LGTM!

src/opt/ReplaceMaskedMemOps.cpp


		namespace ispc {

		bool lIsPowerOf2(unsigned n) { return (n > 0) && !(n & (n - 1)); }

Collaborator

turinevgeny May 15, 2024

There are such functions in LLVM, but it would need additional headers and libs to link.

nurmukhametov merged commit 66c8e1d into ispc:main

61 checks passed

nurmukhametov linked an issue

that may be closed by this pull request

Efficient codegen for narrower register widths #2611

Closed

nurmukhametov mentioned this pull request

Efficient codegen for narrower register widths #2611

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment