Add support for AMDGPU #6434

galeselee · 2022-10-26T03:48:44Z

Purpose
I find Taichi users have a demand for AMDGPU. (related issue #4586)
There is also a discussion of the AMD GPU backend here #412
Thus, I would like to add an AMDGPU backend to the compiler so that my 6900XT can be utilized.

Solution
The purpose is to give Taichi preliminary support for AMDPU via LLVM-14 and ROCm.
I started by completing the following steps step by step

the relevant dependencies in cmake
a specific version of Taichi-runtime module(taichi/runtime/llvm/runtime_module) for AMDGPU. During the implementation, I found that I couldn't reuse the CPU's llvm directly like taichi does for CUDA. I needed to do some addrspace conversions, so I temporarily used hipcc to help me deal with these issues. (The final approach taken is to hack address space, so we don't need an extra runtime.cpp)
interaction between Taichi and ROCm library functions. In cuda backend, some driver functions are used. However, It's convenient to replace these with functions provided by ROCm, such as hipMemcpy, hipModuleLaunchKernel.
a codegen skeleton (like taichi/codegen/cuda)
convert llvm IR to amdgpu ISA / hsaco(ref link)
Unit test configuration for amdgpu.

galeselee · 2022-10-26T04:40:03Z

It seems better to add cmake at the end(after step5)

Issue: # #6434 ### Brief Summary Add pr tag for amdgpu

galeselee · 2022-10-31T10:10:35Z

ITCJ · 2022-11-16T10:49:43Z

Amazing works.

I'm also considering this task since our new HPC is using Sugon gtx106 (known as Vega 20). I am quite interested is there any possible to cooperate ?

Besides,
Are you also working on ASC 2022 competition ? I notice that you forked DeePMD-kit which is one of the tasks of ASC2022.

galeselee · 2022-11-16T12:12:24Z

I'm also considering this task since our new HPC is using Sugon gtx106 (known as Vega 20). I am quite interested is there any possible to cooperate ?

@ITCJ Certainly! Now, I have implemented a prototype for AMDGPU backend based on a RDNA2-arch card(don't worry, there are lots of work need to do). If you are interested, here is the code
By the way, I don't know sth about Sugon gtx106 :-). In my knowledge, gtx is a serial production from nvidia. Do you mean DCU?

ITCJ · 2022-11-16T13:17:44Z

I'm also considering this task since our new HPC is using Sugon gtx106 (known as Vega 20). I am quite interested is there any possible to cooperate ?

@ITCJ Certainly! Now, I have implemented a prototype for AMDGPU backend based on a RDNA2-arch card(don't worry, there are lots of work need to do). If you are interested, here is the code By the way, I don't know sth about Sugon gtx106 :-). In my knowledge, gtx is a serial production from nvidia. Do you mean DCU?

@galeselee yeah, it's Surgon DCU. I recheck the rocminfo and found it is gfx906. I am confusing for this duplication for a time but never check rocminfo again, ORZ.

  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc-:xnack-

AND,
Would you mind if I friend you on Wechat？I have send you my wechat id AND QR code to your edu email.

Issue: #6434 ### Brief Summary This dockerfile refs to Dockerfile.ubuntu.20.04. The differences are following 1. based docker image 2. del python3-pip, mesa-common-dev which are already downloaded in based image. 3. remove vulkan module(amdgpu is not yet adapted to vulkan) 4. add user into video usergroup. (Some Ubuntu-20.04 versions also require render group)

Issue: #6434 ### Brief Summary Add the logic of docker for AMDGPU ci 1. The llvm used which only contains AMDGPU and X86 targets is from docker image 2. Using AMDGPU in docker requires that `/dev/kfd` and the directory `/dev/dri` be mounted on. For `dev` user, there is no permission to access these character devices by default. 3. `TI_WITH_CUDA=OFF` 4. `TI_RUN_RELEASE_TESTS=OFF` 5. Currently only run cpu-relected test

Issue: #6434 ### Brief Summary 1. This is a special part of the Tacihi runtime module for the `AMDGPU` backend. Tacihi's runtime module uses `clang++` to generate `LLVM IR` is different in memory allocation differs from the cpu-generated `LLVM IR`. The following is an example. ``` C/C++ code void func(int *a, int *b) { *a = *b; } x86_64 backend LLVM IR define dso_local void @cpu_func(i32* %0, i32* %1) #2 { %3 = alloca i32*, align 8 %4 = alloca i32*, align 8 store i32* %0, i32** %3, align 8 store i32* %1, i32** %4, align 8 %5 = load i32*, i32** %4, align 8 %6 = load i32, i32* %5, align 4 %7 = load i32*, i32** %3, align 8 store i32 %6, i32* %7, align 4 ret void } __global__ function on AMDGPU define protected amdgpu_kernel void @global_func(i32 addrspace(1)* %0, i32 addrspace(1)* %1) #4 { %3 = alloca i32*, align 8, addrspace(5) %4 = alloca i32*, align 8, addrspace(5) %5 = alloca i32*, align 8, addrspace(5) %6 = alloca i32*, align 8, addrspace(5) %7 = addrspacecast i32* addrspace(5)* %3 to i32** %8 = addrspacecast i32* addrspace(5)* %4 to i32** %9 = addrspacecast i32* addrspace(5)* %5 to i32** %10 = addrspacecast i32* addrspace(5)* %6 to i32** %11 = addrspacecast i32 addrspace(1)* %0 to i32* store i32* %11, i32** %7, align 8 %12 = load i32*, i32** %7, align 8 %13 = addrspacecast i32 addrspace(1)* %1 to i32* store i32* %13, i32** %8, align 8 %14 = load i32*, i32** %8, align 8 store i32* %12, i32** %9, align 8 store i32* %14, i32** %10, align 8 %15 = load i32*, i32** %10, align 8 %16 = load i32, i32* %15, align 4 %17 = load i32*, i32** %9, align 8 store i32 %16, i32* %17, align 4 ret void } __device__ function on AMDGPU define hidden void @device_func(i32* %0, i32* %1) #2 { %3 = alloca i32*, align 8, addrspace(5) %4 = alloca i32*, align 8, addrspace(5) %5 = addrspacecast i32* addrspace(5)* %3 to i32** %6 = addrspacecast i32* addrspace(5)* %4 to i32** store i32* %0, i32** %5, align 8 store i32* %1, i32** %6, align 8 %7 = load i32*, i32** %6, align 8 %8 = load i32, i32* %7, align 4 %9 = load i32*, i32** %5, align 8 store i32 %8, i32* %9, align 4 ret void } ``` 2. There are some differences in the place about `allocainst`, specifically about addrspace (for `AMDGPU`, [this](https://llvm.org/docs/AMDGPUUsage.html#address-spaces) will be helpful). I have not found documentation describing how to write the correct `LLVM IR` on `AMDGPU`, through my observation of the `LLVM IR` generated by `clang++/hipcc`. We need to deal with the arguments of the `__global__` function and the `allocainst` (including specifying the addrspace of `allocainst` and performing addrspace-cast) while for the `__device__` function we do not need to deal with the arguments of the function. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: ##6434 Add some unit tests on the AMDGPU RHI level Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: ##6434 ### Brief Summary It contains four parts(`driver`, `context`, `device` and `caching_allocator`). The code is similar to `cuda/rhi`. However, there are still some differences between `amdgpu/rhi` and `cuda/rhi` #### context 1. The method of obtaining the hardware version 2. Context::launch #### driver 1. ROCm/hip internal functions #### cmake The current cmake compilation system is sufficient to support the Unit test in #6597 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary 1. This is a special part of the Tacihi runtime module for the `AMDGPU` backend. Tacihi's runtime module uses `clang++` to generate `LLVM IR` is different in memory allocation differs from the cpu-generated `LLVM IR`. The following is an example. ``` C/C++ code void func(int *a, int *b) { *a = *b; } x86_64 backend LLVM IR define dso_local void @cpu_func(i32* %0, i32* %1) taichi-dev#2 { %3 = alloca i32*, align 8 %4 = alloca i32*, align 8 store i32* %0, i32** %3, align 8 store i32* %1, i32** %4, align 8 %5 = load i32*, i32** %4, align 8 %6 = load i32, i32* %5, align 4 %7 = load i32*, i32** %3, align 8 store i32 %6, i32* %7, align 4 ret void } __global__ function on AMDGPU define protected amdgpu_kernel void @global_func(i32 addrspace(1)* %0, i32 addrspace(1)* %1) taichi-dev#4 { %3 = alloca i32*, align 8, addrspace(5) %4 = alloca i32*, align 8, addrspace(5) %5 = alloca i32*, align 8, addrspace(5) %6 = alloca i32*, align 8, addrspace(5) %7 = addrspacecast i32* addrspace(5)* %3 to i32** %8 = addrspacecast i32* addrspace(5)* %4 to i32** %9 = addrspacecast i32* addrspace(5)* %5 to i32** %10 = addrspacecast i32* addrspace(5)* %6 to i32** %11 = addrspacecast i32 addrspace(1)* %0 to i32* store i32* %11, i32** %7, align 8 %12 = load i32*, i32** %7, align 8 %13 = addrspacecast i32 addrspace(1)* %1 to i32* store i32* %13, i32** %8, align 8 %14 = load i32*, i32** %8, align 8 store i32* %12, i32** %9, align 8 store i32* %14, i32** %10, align 8 %15 = load i32*, i32** %10, align 8 %16 = load i32, i32* %15, align 4 %17 = load i32*, i32** %9, align 8 store i32 %16, i32* %17, align 4 ret void } __device__ function on AMDGPU define hidden void @device_func(i32* %0, i32* %1) taichi-dev#2 { %3 = alloca i32*, align 8, addrspace(5) %4 = alloca i32*, align 8, addrspace(5) %5 = addrspacecast i32* addrspace(5)* %3 to i32** %6 = addrspacecast i32* addrspace(5)* %4 to i32** store i32* %0, i32** %5, align 8 store i32* %1, i32** %6, align 8 %7 = load i32*, i32** %6, align 8 %8 = load i32, i32* %7, align 4 %9 = load i32*, i32** %5, align 8 store i32 %8, i32* %9, align 4 ret void } ``` 2. There are some differences in the place about `allocainst`, specifically about addrspace (for `AMDGPU`, [this](https://llvm.org/docs/AMDGPUUsage.html#address-spaces) will be helpful). I have not found documentation describing how to write the correct `LLVM IR` on `AMDGPU`, through my observation of the `LLVM IR` generated by `clang++/hipcc`. We need to deal with the arguments of the `__global__` function and the `allocainst` (including specifying the addrspace of `allocainst` and performing addrspace-cast) while for the `__device__` function we do not need to deal with the arguments of the function. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 Add some unit tests on the AMDGPU RHI level Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 ### Brief Summary It contains four parts(`driver`, `context`, `device` and `caching_allocator`). The code is similar to `cuda/rhi`. However, there are still some differences between `amdgpu/rhi` and `cuda/rhi` #### context 1. The method of obtaining the hardware version 2. Context::launch #### driver 1. ROCm/hip internal functions #### cmake The current cmake compilation system is sufficient to support the Unit test in taichi-dev#6597 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary This dockerfile refs to Dockerfile.ubuntu.20.04. The differences are following 1. based docker image 2. del python3-pip, mesa-common-dev which are already downloaded in based image. 3. remove vulkan module(amdgpu is not yet adapted to vulkan) 4. add user into video usergroup. (Some Ubuntu-20.04 versions also require render group)

Issue: taichi-dev#6434 ### Brief Summary Add the logic of docker for AMDGPU ci 1. The llvm used which only contains AMDGPU and X86 targets is from docker image 2. Using AMDGPU in docker requires that `/dev/kfd` and the directory `/dev/dri` be mounted on. For `dev` user, there is no permission to access these character devices by default. 3. `TI_WITH_CUDA=OFF` 4. `TI_RUN_RELEASE_TESTS=OFF` 5. Currently only run cpu-relected test

Issue: taichi-dev#6434 ### Brief Summary 1. This is a special part of the Tacihi runtime module for the `AMDGPU` backend. Tacihi's runtime module uses `clang++` to generate `LLVM IR` is different in memory allocation differs from the cpu-generated `LLVM IR`. The following is an example. ``` C/C++ code void func(int *a, int *b) { *a = *b; } x86_64 backend LLVM IR define dso_local void @cpu_func(i32* %0, i32* %1) taichi-dev#2 { %3 = alloca i32*, align 8 %4 = alloca i32*, align 8 store i32* %0, i32** %3, align 8 store i32* %1, i32** %4, align 8 %5 = load i32*, i32** %4, align 8 %6 = load i32, i32* %5, align 4 %7 = load i32*, i32** %3, align 8 store i32 %6, i32* %7, align 4 ret void } __global__ function on AMDGPU define protected amdgpu_kernel void @global_func(i32 addrspace(1)* %0, i32 addrspace(1)* %1) taichi-dev#4 { %3 = alloca i32*, align 8, addrspace(5) %4 = alloca i32*, align 8, addrspace(5) %5 = alloca i32*, align 8, addrspace(5) %6 = alloca i32*, align 8, addrspace(5) %7 = addrspacecast i32* addrspace(5)* %3 to i32** %8 = addrspacecast i32* addrspace(5)* %4 to i32** %9 = addrspacecast i32* addrspace(5)* %5 to i32** %10 = addrspacecast i32* addrspace(5)* %6 to i32** %11 = addrspacecast i32 addrspace(1)* %0 to i32* store i32* %11, i32** %7, align 8 %12 = load i32*, i32** %7, align 8 %13 = addrspacecast i32 addrspace(1)* %1 to i32* store i32* %13, i32** %8, align 8 %14 = load i32*, i32** %8, align 8 store i32* %12, i32** %9, align 8 store i32* %14, i32** %10, align 8 %15 = load i32*, i32** %10, align 8 %16 = load i32, i32* %15, align 4 %17 = load i32*, i32** %9, align 8 store i32 %16, i32* %17, align 4 ret void } __device__ function on AMDGPU define hidden void @device_func(i32* %0, i32* %1) taichi-dev#2 { %3 = alloca i32*, align 8, addrspace(5) %4 = alloca i32*, align 8, addrspace(5) %5 = addrspacecast i32* addrspace(5)* %3 to i32** %6 = addrspacecast i32* addrspace(5)* %4 to i32** store i32* %0, i32** %5, align 8 store i32* %1, i32** %6, align 8 %7 = load i32*, i32** %6, align 8 %8 = load i32, i32* %7, align 4 %9 = load i32*, i32** %5, align 8 store i32 %8, i32* %9, align 4 ret void } ``` 2. There are some differences in the place about `allocainst`, specifically about addrspace (for `AMDGPU`, [this](https://llvm.org/docs/AMDGPUUsage.html#address-spaces) will be helpful). I have not found documentation describing how to write the correct `LLVM IR` on `AMDGPU`, through my observation of the `LLVM IR` generated by `clang++/hipcc`. We need to deal with the arguments of the `__global__` function and the `allocainst` (including specifying the addrspace of `allocainst` and performing addrspace-cast) while for the `__device__` function we do not need to deal with the arguments of the function. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 Add some unit tests on the AMDGPU RHI level Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 ### Brief Summary It contains four parts(`driver`, `context`, `device` and `caching_allocator`). The code is similar to `cuda/rhi`. However, there are still some differences between `amdgpu/rhi` and `cuda/rhi` #### context 1. The method of obtaining the hardware version 2. Context::launch #### driver 1. ROCm/hip internal functions #### cmake The current cmake compilation system is sufficient to support the Unit test in taichi-dev#6597 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…7023) Issue: #taichi-dev#6434 ### Brief Summary These unit tests are for taichi-dev#6486 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 ### Brief Summary 1. Add exclusive way to process kernel argument. 2. Upate `launch` api in `AMDGPUContext` . Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary This part contains `CHI IR->LLVM IR` part. Similar to part0, the skeleton of this part is similar to `cuda`. Some important notes are following. 1. `ocml` is the math library like 'libdevice' in `nvidia`. It provides most of the functions we need. Remains such as `sgn`, and `abs` have been completed by hand codes 2. The kernel parameter passing part is different from the `CUDA`. `extra_args` is the only way that could be accepted by `device kernel.` 3. still uses jargon from `nvidia`.(e.g. `sm`, `block` and `grid`) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary 1. This is the third part of adding the backend of amdgpu: adding the runtime part of the implementation. The main code for runtime is llvm ir generating gcn-isa/object and hsaco (which is a file format that can be accepted by the module launch api provided by hip) 2. After calling the relevant api to generate the gcn isa/obj, the linker of llvm (ld.lld) needs to be called to generate the hsaco file format, so there is a command line call `ld.lld -shared xxx.o -o xxx.hsaco` in the code, and the temporarily generated file is stored in the `/tmp/taichi_hsaco/` folder 3. To deal with the problem of multiple `hsaco` files being generated at the same time, a random number is used to name the related generated files, as follows: in `JITSessionAMDGPU` there is a `random_num_` and `tmp_dir_` which are assigned when the `JITSessionAMDGPU` instance is created. Each `ti.kernel` will be devided into offload-tasks which is compiled into a separate `hsaco` file. A random number bound to the `hsaco` file is obtained when the `hsaco` file is generated. Here is an example of the file after running the `ti example mpm128`: ``` taichi_hsaco/ └── 4858208420434830779 ├── taichi_amdgcn_10476395765980093855.hsaco ├── taichi_amdgcn_10476395765980093855.o ├── taichi_amdgcn_11369096326162657620.hsaco ├── taichi_amdgcn_11369096326162657620.o ├── taichi_amdgcn_11700031850871498261.hsaco ├── taichi_amdgcn_11700031850871498261.o ├── taichi_amdgcn_14803499569653867868.hsaco ├── taichi_amdgcn_14803499569653867868.o ├── taichi_amdgcn_14949458395707884954.hsaco ├── taichi_amdgcn_14949458395707884954.o ├── taichi_amdgcn_15955762247261446379.hsaco ├── taichi_amdgcn_15955762247261446379.o ├── taichi_amdgcn_16891452471041191610.hsaco ├── taichi_amdgcn_16891452471041191610.o ├── taichi_amdgcn_17615766226135707772.hsaco ├── taichi_amdgcn_17615766226135707772.o ├── taichi_amdgcn_18033844193337069056.hsaco ├── taichi_amdgcn_18033844193337069056.o ├── taichi_amdgcn_5951151729973841331.hsaco ├── taichi_amdgcn_5951151729973841331.o ├── taichi_amdgcn_6012043323411824926.hsaco ├── taichi_amdgcn_6012043323411824926.o ├── taichi_amdgcn_6796840558965541322.hsaco ├── taichi_amdgcn_6796840558965541322.o ├── taichi_amdgcn_6835984424286808860.hsaco ├── taichi_amdgcn_6835984424286808860.o ├── taichi_amdgcn_7872622170129629907.hsaco ├── taichi_amdgcn_7872622170129629907.o ├── taichi_amdgcn_8760441738982760858.hsaco ├── taichi_amdgcn_8760441738982760858.o ├── taichi_amdgcn_9006625347419529255.hsaco └── taichi_amdgcn_9006625347419529255.o ``` Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary 1. put amdgpu-related bitcode files in external/amdgpu_libdevice(thus taichi just need libamdhip64.so) 2. link amdgpu-related bc files and taichi-module together(Precisely, clone bc files into taichi-module) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary 1. update `runtime/llvm/runtime_module/runtime.cpp" to enable amd gpu backend

Issue: #taichi-dev#6434 ### Brief Summary 1. refactor `add_struct_for_func` into llvm_pass 2. replace `bool spmd` with `string spmd` to recognize `amdgpu` and `nvgpu`. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#6434 ### Brief Summary 1. enable amdgpu api in taichi(except struct for) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 ### Brief Summary 1. fix amdgpu backend bugs: a. codegen typos b. add more types support for `sgn` c. use temporary method to handle `runtime.cpp` error 2. enable amdgpu backend python unit test a. because of the lack of `print` support, temporarily disable all `print` related tests on amdgpu backend. b. there is still something wrong in `gdar_mpm` and `ad_if` tests. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 ### Brief Summary 1. Currently only default(event) profiler is available on AMDGPU 2. Here is the show <img width="1198" alt="image" src="https://user-images.githubusercontent.com/47965866/217734581-4c7f7fa7-4d17-4243-b4bd-0a70d1c88f4a.png"> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #taichi-dev#6434 ### Brief Summary --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

liuyunpu · 2023-09-28T02:51:14Z

Is there any plan to support cdna or cdna2 architecture? I hope to run my taichi code on MI series gpu.

galeselee · 2023-09-28T08:47:03Z

Is there any plan to support cdna or cdna2 architecture? I hope to run my taichi code on MI series gpu.

Actually, you can run your taichi code on GPU with cdna or cdna2 architecture with a little modification. (I have test some cases on MI210)

liuyunpu · 2023-09-28T08:49:19Z

How? I thought the current taichi doesn't support amd gpu

galeselee · 2023-09-28T08:53:26Z

How? I thought the current taichi doesn't support amd gpu

It looks like the amdgpu backend hasn't been released yet, you can either do the compilation yourself (I can sort out how to compile it if you need help) or try using the vulkan backend.

galeselee · 2023-10-12T18:00:20Z

How? I thought the current taichi doesn't support amd gpu

The current Taichi-AMDGPU backend is based on ROCm, so the first thing you need to do is to install ROCm. (The official doc may be helpful.)
LLVM for AMDGPU backend
2.1. Download llvm source code
2.2. Compile

cmake .. -DLLVM_ENABLE_RTTI:BOOL=ON -DBUILD_SHARED_LIBS:BOOL=OFF - DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" - DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_TERMINFO=OFF - DLLVM_INCLUDE_BENCHMARKS=OFF -DCMAKE_INSTALL_PREFIX=llvm_binary_path
make -j `nproc`
make install
2.3. Check, there will be four folders(bin, include, lib and share) in the CMAKE_INSATLL_PREFIX.
Download taichi source code
git clone --recursive https://github.com/taichi-dev/taichi.git
Prepare env to compile Taichi
virtualenv -p python3 your_env_path
pip install -r requirements_dev.txt
Compile
TAICHI_CMAKE_ARGS="-DTI_WITH_VULKAN=OFF -DTI_WITH_OPENGL=OFF - DTI_BUILD_TESTS=ON -DTI_BUILD_EXAMPLES=OFF - DCMAKE_PREFIX_PATH=llvm_binary_path/lib/cmake -DCMAKE_CXX_COMPILER=clang++ - DTI_WITH_AMDGPU=ON" python setup.py develop
Please choose the backend that needs to be compiled according to your needs.(e.g. if you need VULKAN backend, thus -DTI_WITH_VULKAN=ON)
Test
pip install -r requirements_test.txt
cd tests
python run_test.py -a amdgpu -k mpm88
If your output is similar to the one below, then you have finished the compilation.

Please enjoy!

expenses · 2023-10-13T04:39:14Z

I've managed to build Taichi for AMDGPU, but one problem I ran into is that recent (5.3+) versions of ROCm require PCIe atomics, which are not available on my setup. The most recent version, 5.7 includes a workaround for setups without PCIe atomics: https://rocm.docs.amd.com/en/latest/release.html#non-hostcall-hip-printf.

I had a go at building taichi against LLVM 17, which is the version of LLVM that ROCm 5.7 works with. I haven't been able to test said changes because they require the ROCM 5.7 runtime to be installed and this isn't out in the Arch Linux repositories yet. Here are the changes I've had to make: expenses@800c693.

I'm not sure if this information is super useful, but it might help some people.

GZGavinZhao · 2024-06-23T15:00:06Z

I'd just like to note that the AMDGPU backend (TI_ARCH=amdgpu) is already working with ROCm 6.0 on my gfx1032 GPU. It passes all (Python) tests and in many cases is more stable / has less crashes than the Vulkan and OpenGL (albeit at the cost of a slightly slower startup).

Currently the latest LLVM version Taichi officially supports is LLVM 15. I have LLVM 16 working locally, but LLVM 17+ needs some thought because the need (or not?) to migrate to LLVM's new pass manager infrastructure (e.g. PassManagerBuilder is entirely non-existent).

galeselee · 2024-06-24T01:24:09Z

@GZGavinZhao Thanks for your sharing. And I'm very glad to know this. I have previous experience migrating from a lower version of LLVM to 14. Some of LLVM code need changed according to the LLVM docs due to the change of the LLVM API, like PassManagerBuilder you mentioned.

galeselee added feature request Suggest an idea on this project discussion Welcome discussion! amdgpu AMDGPU backend labels Oct 26, 2022

This was referenced Oct 28, 2022

[amdgpu] Part0 add render hardware interface #6464

Merged

[ci] Add PR tag for amdgpu. #6466

Merged

[amdgpu] Part1 add codegen #6469

Merged

galeselee added a commit that referenced this issue Oct 28, 2022

[ci] Add PR tag for amdgpu. (#6466)

ac4d342

Issue: # #6434 ### Brief Summary Add pr tag for amdgpu

galeselee mentioned this issue Oct 31, 2022

[amdgpu] Part2 add runtime #6482

Merged

galeselee self-assigned this Oct 31, 2022

galeselee mentioned this issue Oct 31, 2022

[amdgpu] Part3 update runtime module #6486

Merged

galeselee mentioned this issue Nov 14, 2022

[ci] Add C++ tests on AMDGPU RHI #6597

Merged

galeselee mentioned this issue Nov 22, 2022

[ci] Add dockerfile.ubuntu-20.04.amdgpu #6711

Merged

This was referenced Nov 25, 2022

[ci] Add dockerfile.ubuntu-18.04.amdgpu #6736

Merged

[ci] Add AMDGPU relected ci #6743

Merged

galeselee mentioned this issue Dec 23, 2022

[amdgpu] Part0 add render hardware interface and cmake #6975

Closed

galeselee added a commit that referenced this issue Dec 30, 2022

[ci] Add C++ tests on AMDGPU RHI (#6597)

abb0208

Issue: ##6434 Add some unit tests on the AMDGPU RHI level Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

This was referenced Dec 31, 2022

[amdgpu] Update amdgpu module call #7022

Merged

[amdgpu] Add convert addressspace pass related unit test #7023

Merged

turbo0628 modified the milestones: v1.5.0, v1.6.0 Mar 9, 2023

feisuzhu modified the milestones: v1.6.0, v1.7.0 May 11, 2023

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[amdgpu] Update runtime module (taichi-dev#7248)

35db540

Issue: taichi-dev#6434 ### Brief Summary 1. update `runtime/llvm/runtime_module/runtime.cpp" to enable amd gpu backend

galeselee mentioned this issue Oct 13, 2023

Installation Issue - taichi_python.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6taichi2io10VirtualDir11from_fs_dirERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE #8350

Open

jim19930609 removed this from the v1.7.0 milestone Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for AMDGPU #6434

Add support for AMDGPU #6434

galeselee commented Oct 26, 2022 •

edited

Loading

galeselee commented Oct 26, 2022

galeselee commented Oct 31, 2022 •

edited

Loading

ITCJ commented Nov 16, 2022

galeselee commented Nov 16, 2022 •

edited

Loading

ITCJ commented Nov 16, 2022

liuyunpu commented Sep 28, 2023

galeselee commented Sep 28, 2023 •

edited

Loading

liuyunpu commented Sep 28, 2023

galeselee commented Sep 28, 2023

galeselee commented Oct 12, 2023 •

edited

Loading

expenses commented Oct 13, 2023

GZGavinZhao commented Jun 23, 2024

galeselee commented Jun 24, 2024

Add support for AMDGPU #6434

Add support for AMDGPU #6434

Comments

galeselee commented Oct 26, 2022 • edited Loading

galeselee commented Oct 26, 2022

galeselee commented Oct 31, 2022 • edited Loading

ITCJ commented Nov 16, 2022

galeselee commented Nov 16, 2022 • edited Loading

ITCJ commented Nov 16, 2022

liuyunpu commented Sep 28, 2023

galeselee commented Sep 28, 2023 • edited Loading

liuyunpu commented Sep 28, 2023

galeselee commented Sep 28, 2023

galeselee commented Oct 12, 2023 • edited Loading

expenses commented Oct 13, 2023

GZGavinZhao commented Jun 23, 2024

galeselee commented Jun 24, 2024

galeselee commented Oct 26, 2022 •

edited

Loading

galeselee commented Oct 31, 2022 •

edited

Loading

galeselee commented Nov 16, 2022 •

edited

Loading

galeselee commented Sep 28, 2023 •

edited

Loading

galeselee commented Oct 12, 2023 •

edited

Loading