-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add load/store words for 32-bit and narrower memory types (i32, f32, f16, bf16, i8, i16) for both global and shared memory. Values are widened to the native i64 stack cell on load and narrowed on store.
Motivation
GPU workloads use a variety of data widths:
- f32/i32: The native compute precision for GPUs. f32 is the baseline for ML; i32 is the standard integer width.
- f16/bf16: Half-precision types used for bandwidth-efficient storage. Most ML inference and training uses these for activations and weights.
- i8/i16: Used in quantized models and integer indexing.
Currently @/! load/store i64 and F@/F! load/store f64. Real GPU kernels need to access 32-bit and narrower memory types. The stack remains memref<256xi64> — GPU pointers are 64-bit and must fit on the stack. Narrower values are widened to i64 when loaded and narrowed when stored.
Design
Load/store words
All words take an address (i64) from the stack and either load a value (widened to i64) or store a value (narrowed from i64).
| Global | Shared | Memory type | Load widening | Store narrowing |
|---|---|---|---|---|
@ / ! |
S@ / S! |
i64 | (unchanged) | (unchanged) |
F@ / F! |
SF@ / SF! |
f64 | (unchanged) | (unchanged) |
HF@ / HF! |
SHF@ / SHF! |
f16 | extf f16→f64, bitcast f64→i64 | bitcast i64→f64, truncf f64→f16 |
BF@ / BF! |
SBF@ / SBF! |
bf16 | extf bf16→f64, bitcast f64→i64 | bitcast i64→f64, truncf f64→bf16 |
I8@ / I8! |
SI8@ / SI8! |
i8 | extsi i8→i64 | trunci i64→i8 |
I16@ / I16! |
SI16@ / SI16! |
i16 | extsi i16→i64 | trunci i64→i16 |
I32@ / I32! |
SI32@ / SI32! |
i32 | extsi i32→i64 | trunci i64→i32 |
F32@ / F32! |
SF32@ / SF32! |
f32 | extf f32→f64, bitcast f64→i64 | bitcast i64→f64, truncf f64→f32 |
What does NOT change
- Arithmetic: All operations remain i64/f64 as they are today.
- Stack: Stays
memref<256xi64>with i64 cells. - Existing words:
@/!(i64),F@/F!(f64),S@/S!,SF@/SF!remain unchanged. - Kernel parameters: Still declared as i64/f64 in
\!headers. - CELLS: Still 8 (sizeof i64).
Implementation notes
- Each new word needs a dialect op in
ForthOps.tdand a conversion pattern inForthToMemRef.cpp. - The parser (
ForthToMLIR.cpp) maps each word name to the corresponding op. - Shared variants use the same shared memory infrastructure as existing
S@/S!/SF@/SF!. - f16 and bf16 are MLIR builtin types (
f16,bf16); no dialect extension needed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request