-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pointers for NVPTX support #10064
Comments
Intrinsics can be added by Lines 472 to 475 in 969bcb6
According to the LLVM NVPTX page, |
@gwenzek FYI on github you can press |
I've started a prototype on gwenzek#1
This will generate a good looking .ptx The main thing I was surprised is I had to modify codegen.zig even if I don't want to implement PTX generation I've also created a new output format ".ptx" for the linker, as done for sprV or asm architectures. |
The path youre going now with gwenzek#1 is implementing your own code generation with nvptx in stage 2. If you don't want to do that, you shouldn't need to create an entire custom link format, or enable nvptx in codegen.zig. In fact, can you try just running |
@Snektron just retried without the change in codegen.zig, and it works, I must have used the wrong command at the begining. I think a refactoring could make this cleaner, by having Lines 73 to 76 in 310f3df
btw upstream stage2 yields:
|
I've started looking into how to add support for the PTX intrinsics mentionned above. The default way seems to add an Other approaches have been discussed in #7702, #4466. export fn hello(out: [*c]u8) void {
out.* = 72 + @intCast(u8, threadIdX());
__syncthreads();
}
extern fn @"llvm.nvvm.barrier0"() void;
inline fn __syncthreads() void {
@"llvm.nvvm.barrier0"();
} But for some reason LLVM called from Zig crashes when doing something similar for reading the special registers extern fn @"llvm.nvvm.read.ptx.sreg.tid.x"() i32;
inline fn threadIdX() i32 {
return @"llvm.nvvm.read.ptx.sreg.tid.x"();
}
export fn hello(out: [*c]u8) void {
out.* = 72 + @intCast(u8, threadIdX());
} the error message:
Interestingly pasting the debug output into a
// .globl hello // -- Begin function hello
.visible .func hello(
.param .b64 hello_param_0
) // @hello
{
.reg .b16 %rs<3>;
.reg .b32 %r<2>;
.reg .b64 %rd<2>;
// %bb.0: // %Entry
ld.param.u64 %rd1, [hello_param_0];
mov.u32 %r1, %tid.x;
cvt.u16.u32 %rs1, %r1;
add.s16 %rs2, %rs1, 72;
st.u8 [%rd1], %rs2;
ret;
// -- End function
} Anyway given that the "assembly" corresponding to reading from a register is pretty simple I also tried to So to generate asm volatile ("mov.u32 \t$0, %tid.x;"
: [ret] "=r" (-> i32)
); So stay tuned, because I feel I'm making progress ^^ |
I've opened #12878 to update the backend to 0.10. I feel we are in a good place now, I'll close this issue. Thanks to the progress of self hosted, we can now generate debug information in the PTX files. Also the assembly syntax now works as documented, so it's simpler to use special ptx registers. And I can use the same zig binary for building the device and host code, so it's pretty exciting. See https://github.com/gwenzek/cudaz/tree/e8895596009c689300fe7c7193fa2dbf7db07629 for user code using this Zig branch. |
@gwenzek Is it possible to experiment with this on the latest zig (since it looks like all PRs have been merged)? I'm just curious, since Andrew added it to the 0.12.0 milestone - so is it not on main yet? |
its in the 0.10 milestone |
Support for the ptx backend is kind of a work in progress, but it should work for some programs already. You can check out this repository for some pointers on how to get started, but I think the Zig parts are a little bit out of date. In general it should mostly work. |
NVPTX backend is currently Tier 4 in Zig,
yet there still might be people interested in improving support for it (like myself)
Here I'm sharing a few pointers provided by @Snektron to get started, complemented by information I've glanned from LLVM and Nvidia documentation.
I've also started a branch with an ongoing implementation.
I'm learning both about the Zig compiler and LLVM at the same time so beware ! gwenzek#1
Adress spaces
PTX format defines a VM for the GPU and an ISA. The VM make uses of different memory spaces.
Address space support has been added to Stage 2, so you'll need to use stage 2.
Only a few generic adress spaces have been defined, but more should be added for Nvidia GPU:
zig/lib/std/builtin.zig
Lines 172 to 177 in 70ef9bc
And defaults here:
zig/src/target.zig
Lines 573 to 577 in 70ef9bc
(for nvptx you probably want the .constant addrspace for constants for example, also .local for locals, etc
Then convert Zig address spaces to llvm address spaces here:
zig/src/codegen/llvm.zig
Line 717 in 70ef9bc
The Zig bindings already have definition for Nvidia memory spaces:
zig/src/codegen/llvm/bindings.zig
Lines 1283 to 1290 in 70ef9bc
To better understand what Zig needs to do you can look at a sample kernel written in LLVM IR:
https://llvm.org/docs/NVPTXUsage.html#the-kernel
_I guess it would also be helpful to be able to generate LLVM IR from an arbitrary kernel
apparently Clang can do so: https://www.llvm.org/docs/CompileCudaWithLLVM.html
Also one would need to implement support for Cuda special variables gridDim, gridIdx, ... which are stored in specials registers:
https://llvm.org/docs/NVPTXUsage.html#id7
as well as the intrinsic for the block barrier
__syncthreads
I'm not sure how to do that yet.
The text was updated successfully, but these errors were encountered: