[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

deven-amd · 2019-10-10T17:49:42Z

This is the first of four PRs to add support for the ROCm backend for AMD GPUs. This PR adds the pass to generate the HSA Code objects corresponding the GPU kernels. This PR builds on the work done by @whchung in PR #63

This PR is a follow-up to request in PR #169. It addresses all but two of the code-review comments made in that PR for the ConvertKernelFuncToHSACO.cpp file.

[ROCm] Adding the ROCM backend implementation for GPU dialect + the mlir-rocm-runner utility #169 (comment)

Why are you wrapping the vector in a unique_ptr instead of just using vector by value?

There are a bunch of places where the code could be simplified, for instance:
   const char data[] = "HSACO";
   return std::make_unique<std::vector<char>>(data, data + sizeof(data) - 1);
would likely become:
  return "HSACO";

Two reasons for this.

The binary blob owned by this variable could potentially be large, and hence pass-by-pointer instead of pass-by-value (Note that the "HSACO" string is only returned in the "test" scenario)
Consistency with the CUDA side (where we OwnedCubin, which is also unique_ptr wrapper on a vector of chars)

[ROCm] Adding the ROCM backend implementation for GPU dialect + the mlir-rocm-runner utility #169 (comment)

Do we have to go through files? Isn't lld usable as a library?

At least for the time being, it is easier to go through files. Dumping out the .ll file is helpful for debugging purposes, so that is something we would like to keep regardless. Also I am still trying to figure out how to invoke lld as an API call, within the mlir setup :(

@joker-eph @whchung

test/Conversion/GPUToROCM/lower-amdgpu-kernel-to-hsaco.mlir

include/mlir/Conversion/GPUToROCM/GPUToROCMPass.h

lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp

include/mlir/Conversion/GPUToROCM/GPUToROCMPass.h

joker-eph · 2019-10-10T19:10:25Z

CMakeLists.txt

+  set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
+else()
+  set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
+endif()


This isn't correct I believe: at the moment you have a dependency on the environment (lld, ROCm runtime bitcode libraries, etc.)

As mentioned elsewhere: there should be a step detecting the environment (and we should probably put this behind an opt-in CMake flag by the way) and generating a rocm_config.h that can be included in the code.

I have address this in the CMakeLists.txt file in the GPUToROCM dir

Thanks! This is much better :)

…nel function

deven-amd

@joker-eph , I have pushed out a new rebased commit that addresses the feedback you provided.

Please re-review....thanks

deven

deven-amd · 2019-10-17T23:31:34Z

CMakeLists.txt

+  set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
+else()
+  set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
+endif()


I have address this in the CMakeLists.txt file in the GPUToROCM dir

deven-amd · 2019-10-24T13:32:07Z

@joker-eph , gentle ping

joker-eph · 2019-10-24T18:29:47Z

@joker-eph , gentle ping

Sorry, I missed the update (and LLVM Dev Meeting happened). I'm catching up on everything and will get to it this week. Ping on Monday if you don't see an update!

joker-eph · 2019-10-27T02:31:26Z

include/mlir/Conversion/GPUToROCM/CMakeLists.txt

+  if (EXISTS ${ROCM_INSTALL_DIR})
+    message("-- ROCm Install Dir - ${ROCM_INSTALL_DIR}")
+  else()
+    message(SEND_ERROR "-- NOT FOUND : ROCm Install Dir - ${ROCM_INSTALL_DIR}")


We shouldn't issue an error when this will be the normal behavior (a complete config on a machine without ROCM).

We should differentiate when the user is setting the value and when it is the default one. When the user provides the path to ROCM and we don't find it, we should just use FATAL_ERROR, otherwise a normal STATUS should be enough.

Can you also set the variables to empty in the else branch?

Finally, if ROCM is correctly detected, it seems like we should set a variable so that we can distinguish on it in the code and in lit tests as well: basically we have two levels that we should distinguish:

AMDGPU backend is built-in: we can run the test pass.

We have 1) and ROCM is installed/configured on the system. In this case we can run another test that exercise the linker invocation as well.

In lit you have config.run_rocm_tests at the moment, which is only indicating 1) above. We should have another flag for 2), and a macro in the code as well ideally.

joker-eph · 2019-10-27T02:32:05Z

include/mlir/Conversion/GPUToROCM/ROCMConfig.h.in

+#define MLIR_CONVERSION_GPUTOROCM_ROCMCONFIG_H_
+
+/// The code to generate the HSACO binary blobs (corresponding the GPU kernels)
+/// assumes the presense of ROCm libraries/utilities. The location of these


Suggested change

/// assumes the presense of ROCm libraries/utilities. The location of these

/// assumes the presence of ROCm libraries/utilities. The location of these

joker-eph · 2019-10-27T02:36:56Z

include/mlir/Conversion/GPUToROCM/GPUToROCMPass.h

+static constexpr const char *kHSACOAnnotation = "amdgpu.hsaco";
+static constexpr const char *kHSACOGetterAnnotation = "amdgpu.hsacogetter";
+static constexpr const char *kHSACOGetterSuffix = "_hsaco";
+static constexpr const char *kHSACOStorageSuffix = "_hsaco_cst";


I didn't find a use for kHSACOGetterSuffix and kHSACOStorageSuffix?

Also are all these definition sintended to be defined in the public header of are they more internal detail of the pass (and could be in the implementation file).

joker-eph · 2019-10-27T02:59:09Z

CMakeLists.txt

+  set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
+else()
+  set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
+endif()


Thanks! This is much better :)

joker-eph · 2019-10-27T03:24:58Z

lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp

+                                    llvm::TargetMachine::CGFT_ObjectFile);
+  codegenPasses.run(llvmModule);
+  isabinFileStream.flush();
+


I would try to keep maximum coverage for the test, since the pass is only available when the AMDGPU backend is built in, we should be able to get until there for the basic test. We could just reload the binary file isabinFilename print it as hexadecimal and return.

Something like:

if (testMode) { auto hsacoFileOrError = llvm::MemoryBuffer::getFileAsStream(isabinFilename); if ((ec = hsacoFileOrError.getError())) return {}; std::unique_ptr<llvm::MemoryBuffer> hsacoFile = std::move(hsacoFileOrError.get()); SmallVector<char, 1> buffer; { llvm::raw_svector_ostream os(buffer); for (auto char : hascoFile.getBuffer()) os.write_hex(char); } return std::make_unique<std::vector<char>>(buffer.begin(), buffer.end()); }

The test would be able to check that we actually correctly generate an ELF binary (by checking the ELF magic in hexadecimal) and cover everything but the linker when ROCM is not available.

joker-eph · 2019-10-27T03:28:38Z

lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp

+
+static PassRegistration<GpuKernelToHSACOPass>
+    pass("test-kernel-to-hsaco",
+         "Convert all kernel functions to HSA code object blobs");


With a macro that define if ROCM is installed or not, we can then conditionally register another version of the pass "rocm-hsaco-lowering" or something like which actually run the linker as well, and add a test for this.

googlebot added the cla: yes label Oct 10, 2019

joker-eph reviewed Oct 10, 2019

View reviewed changes

[ROCm] Adding pass to generate the HSACO binary blob from the GPU ker…

a65e8e3

…nel function

deven-amd commented Oct 17, 2019

View reviewed changes

joker-eph reviewed Oct 27, 2019

View reviewed changes

joker-eph force-pushed the master branch 3 times, most recently from 48dcae0 to 3722f03 Compare December 26, 2019 04:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

deven-amd commented Oct 10, 2019

joker-eph Oct 10, 2019

deven-amd Oct 17, 2019

joker-eph Oct 27, 2019

deven-amd left a comment

deven-amd Oct 17, 2019

deven-amd commented Oct 24, 2019

joker-eph commented Oct 24, 2019

joker-eph Oct 27, 2019

joker-eph Oct 27, 2019

joker-eph Oct 27, 2019

joker-eph Oct 27, 2019

joker-eph Oct 27, 2019

joker-eph Oct 27, 2019

	/// assumes the presense of ROCm libraries/utilities. The location of these
	/// assumes the presence of ROCm libraries/utilities. The location of these

[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

Are you sure you want to change the base?

[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

Conversation

deven-amd commented Oct 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deven-amd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deven-amd commented Oct 24, 2019

joker-eph commented Oct 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment