Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should @core.extern be part of the libdevice interface? #4509

Open
int3 opened this issue Aug 13, 2024 · 1 comment
Open

Should @core.extern be part of the libdevice interface? #4509

int3 opened this issue Aug 13, 2024 · 1 comment

Comments

@int3
Copy link
Contributor

int3 commented Aug 13, 2024

I've been thinking about how to support some libdevice operations that don't map cleanly to existing vector math libraries on the CPU. It seems to me that things like isnan could be implemented reasonably cleanly in Triton itself. E.g. for the fp32 case:

@jit
def isnan(arg0):
    return (arg0.to(core.dtype("uint32"), bitcast=True) & (127 << 23)) == (127 << 23)

However, libdevice.isnan as defined in the interface file triton/language/extra/libdevice.py is not marked as @jit but rather as @core.extern. So we can't use the above syntax; instead we need to explicitly specify the _builder argument:

@core.extern
def isnan(arg0, _builder):
    return (arg0.to(core.dtype("uint32"), bitcast=True, _builder=_builder).__and__(127 << 23, _builder=_builder)).__eq__(127 << 23, _builder=_builder)

which is pretty ugly.

So I'm wondering if @core.extern vs @jit should be part of the libdevice interface. On one hand, it seems like an implementation detail; on the other hand, the "calling convention" of a function is traditionally part of its interface. I think making it into part of the implementation is quite doable; we would simply have to replace the invoke-time dispatch function with something that does the mapping at libdevice-module-creation time.

Another approach would be to write a simple ast transform that inserts these _builder arguments, so if we wanted to declare an extern function with an implicit builder, it would be something like

@auto_builder
@core.extern
def isnan(arg0):
    ...

This might be simpler / less invasive. Would love to hear your thoughts.

@int3
Copy link
Contributor Author

int3 commented Aug 13, 2024

Another approach would be to write a simple ast transform that inserts these _builder arguments

I'm actually not sure if this is correct / safe; are we guaranteed that the function will get JIT-compiled? Or do we have to actually mark it with @jit? Looking at the MLIR, it looks like things get JIT-compiled regardless of whether we add @jit, but I'm not sure if it's just working by accident.

int3 added a commit to int3/triton-cpu that referenced this issue Aug 19, 2024
This is motivated by triton-lang#4509. The crux of the problem is that the Triton
code generator needs to inspect a function's arguments / attributes /
types in order to determine how it should be called. This meant that
"implementation details" like whether a function is a builtin needed to
be exposed in the "interface" `tl.extra.libdevice` module, instead of
just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that
libdevice functions marked as @core.extern in the interface could not be
implemented via JitFunctions.

Allowing each backend to provide its own module map solves this problem
as the code generator can inspect the actual function implementation.
int3 added a commit to int3/triton-cpu that referenced this issue Aug 19, 2024
This is motivated by triton-lang#4509. The crux of the problem is that the Triton
code generator needs to inspect a function's arguments / attributes /
types in order to determine how it should be called. This meant that
"implementation details" like whether a function is a builtin needed to
be exposed in the "interface" `tl.extra.libdevice` module, instead of
just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that
libdevice functions marked as @core.extern in the interface could not be
implemented via JitFunctions.

Allowing each backend to provide its own module map solves this problem
as the code generator can inspect the actual function implementation.
int3 added a commit to int3/triton-cpu that referenced this issue Aug 19, 2024
This is motivated by triton-lang#4509. The crux of the problem is that the Triton
code generator needs to inspect a function's arguments / attributes /
types in order to determine how it should be called. This meant that
"implementation details" like whether a function is a builtin needed to
be exposed in the "interface" `tl.extra.libdevice` module, instead of
just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that
libdevice functions marked as @core.extern in the interface could not be
implemented via JitFunctions.

Allowing each backend to provide its own module map solves this problem
as the code generator can inspect the actual function implementation.
ThomasRaoux pushed a commit that referenced this issue Aug 22, 2024
This is motivated by #4509. The crux of the problem is that the Triton
code generator needs to inspect a function's arguments / attributes /
types in order to determine how it should be called. This meant that
"implementation details" like whether a function is a builtin needed to
be exposed in the "interface" `tl.extra.libdevice` module, instead of
just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that
libdevice functions marked as `@core.extern` in the interface could not
be implemented via JitFunctions.

Allowing each backend to provide its own module map solves this problem
as the code generator can inspect the actual function implementation.
Jokeren pushed a commit that referenced this issue Aug 24, 2024
This is motivated by #4509. The crux of the problem is that the Triton
code generator needs to inspect a function's arguments / attributes /
types in order to determine how it should be called. This meant that
"implementation details" like whether a function is a builtin needed to
be exposed in the "interface" `tl.extra.libdevice` module, instead of
just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that
libdevice functions marked as `@core.extern` in the interface could not
be implemented via JitFunctions.

Allowing each backend to provide its own module map solves this problem
as the code generator can inspect the actual function implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant