-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX][Docs] [NFC] Update docs on intrinsics #133136
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-nvptx Author: Durgadoss R (durga4github) ChangesRecently, we have added a set of complex intrinsics on This patch captures the key learnings from our experience Full diff: https://github.com/llvm/llvm-project/pull/133136.diff 1 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 621879fc5648b..1efa72b649f0d 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -153,6 +153,68 @@ Example: 64-bit PTX for CUDA Driver API: ``nvptx64-nvidia-cuda``
NVPTX Intrinsics
================
+Guidelines on NVPTX Intrinsic design
+------------------------------------
+
+The NVPTX intrinsics are used to model instructions in the PTX ISA.
+While simpler intrinsics can represent certain features effectively,
+more complex instructions like TMA and MMA are not as straightforward
+to model. A single variant of these complex instructions can expand
+into hundreds of intrinsics. Additionally, any expansion in the
+corresponding ISA can exponentially increase these numbers, making it
+difficult to manage them in the IR and backend passes. Therefore,
+a careful design of intrinsic interfaces can ease maintenance and
+contribute to a sustainable, long-term solution.
+
+Below are a set of guidelines that may help in choosing
+an appropriate design for the complex intrinsics:
+
+1. If there are only a few intrinsics, prefer a flat design
+ where the intrinsic name encodes all relevant details,
+ and includes only the arguments used by the actual instruction.
+2. As the number of intrinsics grows, it is desirable to consolidate
+ them. NVPTX uses a 'flags'-based design where each flag argument
+ represents one set of instruction modifiers. These flags are
+ compile-time integer constants.
+
+3. When an intrinsic uses flags, document it with details of the
+ flag usage in the ``NVPTXUsage.rst`` file.
+4. Use i1 for boolean flags and i8 for other flag types.
+5. Annotate all flag arguments with ImmArg<ArgIdx<>>.
+6. Place the flag arguments at the end of the (actual)argument list.
+
+7. Identify the key features of an intrinsic and distinguish between
+ first-order and supplementary information. Typically, encoding the
+ first-order information in the intrinsic name while using flags
+ for supplementary details improves readability.
+ For example:
+
+ i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature,
+ whereas an optional scaling applied to matrices is relatively secondary.
+
+ ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order
+ information, while features like an optional cache hint tend to be
+ secondary.
+
+8. If there are invalid combinations within a set of modifiers, avoid
+ encoding them as flags, as much as possible. This helps reduce the
+ need for error handling of unsupported cases in the backend.
+ For example, some 'cvt' intrinsics support only a subset of the
+ possible rounding modes; so it is preferable not to encode the
+ rounding modes as flags.
+9. Similarly, when there are invalid combinations across a set of
+ modifiers, avoid encoding them as flags to prevent additional
+ complexity in error handling.
+
+10. Maintain a consistent design within an intrinsic family, including
+ argument ordering as well as the usage and ordering of flags.
+11. When designing an intrinsic corresponding to an instruction or its variant,
+ consider the entire instruction family. This may reveal common features
+ that can be modelled consistently across the family.
+
+In summary, strive to balance the aspects mentioned above, to achieve
+a scalable design with maximum readability.
+
Reading PTX Special Registers
-----------------------------
|
f4e5cea
to
27c4e4a
Compare
@Artem-B , Could you please help with a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK overall, but could use more details on a few aspects.
Recently, we have added a set of complex intrinsics on TMA, tcgen05 and Cvt family of instructions. This patch captures the key learnings from our experience so far and documents them as guidelines for future design. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
27c4e4a
to
9976a6f
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/81/builds/5972 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/12122 Here is the relevant piece of the build log for the reference
|
Recently, we have added a set of complex intrinsics on
the TMA, tcgen05, and Cvt family of instructions.
This patch captures the key learnings from our experience
so far and documents them as guidelines for future design.