Skip to content

[mlir][linalg] Update pack and unpack documentation #143903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 27, 2025

Conversation

chrsmcgrr
Copy link
Contributor

  • Clarified the inner_dim_pos attribute in the case of high dimensionality tensors.
  • Added a 5D examples to show-case the use-cases that triggered this updated.
  • Added a reminder for linalg.unpack that number of elements are not required to be the same between input/output due to padding being dropped.

I encountered some odd variations of linalg.pack and linalg.unpack while working on some TFLite models and the definition in the documentation did not match what I saw pass in IR verification.

The following changes reconcile those differences.

@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Christopher McGirr (chrsmcgrr)

Changes
  • Clarified the inner_dim_pos attribute in the case of high dimensionality tensors.
  • Added a 5D examples to show-case the use-cases that triggered this updated.
  • Added a reminder for linalg.unpack that number of elements are not required to be the same between input/output due to padding being dropped.

I encountered some odd variations of linalg.pack and linalg.unpack while working on some TFLite models and the definition in the documentation did not match what I saw pass in IR verification.

The following changes reconcile those differences.


Full diff: https://github.com/llvm/llvm-project/pull/143903.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td (+40-11)
  • (modified) mlir/test/Dialect/Linalg/invalid.mlir (+11)
  • (modified) mlir/test/Dialect/Linalg/named-ops.mlir (+63)
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
index 1e48a5e3a20ee..fef1900be62ea 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
@@ -94,11 +94,13 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
     and optionally transposes the tiled source tensor dimensions.
 
     `inner_dims_pos` (mandatory) specifies `k` source tensor dimensions that are
-    being tiled, where `0 < k <= n`. The order of the dimensions matters:
-     - The tiled dimensions (of size `inner_tiles`) are added to the end of the result
-    tensor in the order in which they appear in `inner_dims_pos`.
+    being tiled, where `0 < k <= n`.
      - `inner_dims_pos[i]` specifies the source tensor dimension tiled by
-    `inner_tiles[i]`.
+    `inner_tiles[i]` where `0 <= i < k`.
+     - the resulting tiled source dimension maps to an outer dimension of the
+     packed tensor in the order the non-tiled dimension appeared in the source
+     tensor, i.e. `shape(result)[inner_dims_pos[i]]` is equal to
+     `shape(source)[inner_dims_pos[i]] / inner_tiles[i]`.
 
     `inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
     correspond to the least significant ("inner") result tensor dimension sizes,
@@ -117,6 +119,16 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
         into %dest : tensor<128x256xf32> -> tensor<16x8 x 8x32 xf32>
     //                                             \  /   \  /
     //                                       outer dims  inner dims
+    // CHW to CHWhw
+    %0 = linalg.pack %source inner_dims_pos = [2, 1] inner_tiles = [4, 2]
+        into %dest : tensor<1x8x16xf32> -> tensor<1x2x4 x 4x2 xf32>
+    //                                             \  /   \  /
+    //                                       outer dims  inner dims
+    // HCW to HCWhw
+    %0 = linalg.pack %source inner_dims_pos = [2, 0] inner_tiles = [4, 2]
+        into %dest : tensor<20x1x12xf32> -> tensor<10x1x3 x 4x2xf32>
+    //                                              \  /      \ /
+    //                                  Outer Dims: 10x1x3    Inner Dims: 4x2
     ```
 
     `outer_dims_perm` (optional) specifies a permutation for the outer
@@ -246,12 +258,14 @@ def Linalg_UnPackOp : Linalg_RelayoutOp<"unpack"> {
     The "unpack" operation converts a source tensor of rank `n` with a tiled and
     packed layout to a result tensor of rank `n - k`.
 
-    `inner_dims_pos` (mandatory) specifies `k` source tensor dimensions with
-    which the last `k` source tensor dimensions are combined, where
-    `0 < k <= n/2`. Each `inner_dims_pos` element must be `>= 0` and `< n - k`.
-    The order of the dimensions in `inner_dims_pos` matters: dimension
-    `inner_dims_pos[i]` is combined with dimension `n - k + i` (assuming that
-    `outer_dims_perm` is not specified).
+    `inner_dims_pos` (mandatory) specifies `k` result tensor dimensions that
+    were tiled with the `inner_tiles` to create the packed source tensor. The
+    source tensor dimensions can be combined given `inner_dims_pos` as follows:
+    the inner tile `shape(source)[n-k+i]` is combined with
+    `shape(source)[inner_dims_pos[i]]` where `0 <= i < k` and stored at
+    `shape(result)[inner_dims_pos[i]]`. The remaining dimensions are
+    `shape(result)[j] = shape(source)[j]` where `0 <= j < n-k` and `j` is not in
+    the set of `inner_dims_pos` indices.
 
     `inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
     correspond to the least significant ("inner") source tensor dimension sizes.
@@ -266,7 +280,11 @@ def Linalg_UnPackOp : Linalg_RelayoutOp<"unpack"> {
     dimensions. If specified, it must have `n - k` elements. If specified, this
     permutation is applied before combining any dimensions.
 
-    Example:
+    Note, the amount of elements in the source (packed tensor) and the result
+    (unpacked) can be unequal, i.e. `SizeOf(source) >= SizeOf(result)`. As
+    the unpack operation may drop any padding introduced by the pack operation.
+
+    Examples:
 
     ```mlir
     // NCnc to NC:
@@ -277,6 +295,17 @@ def Linalg_UnPackOp : Linalg_RelayoutOp<"unpack"> {
     %0 = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
         inner_tiles = [8, 32] into %dest
         : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
+
+    // CHW to CHWhw:
+    %0 = linalg.unpack %source inner_dims_pos = [2, 1] inner_tiles = [4, 2]
+        into %dest : tensor<1x3x2x4x2xf32> -> tensor<1x5x7xf32>
+    //                        /     \
+    //          Outer Dims: 1x3x2  Inner Dims: 4x2
+    // HCW to HCWhw
+    %0 = linalg.unpack %source inner_dims_pos = [2, 0] inner_tiles = [4, 2]
+        into %dest : tensor<10x1x3 x 4x2xf32> -> tensor<20x1x12xf32>
+    //                       /         \
+    //          Outer Dims: 10x1x3    Inner Dims: 4x2
     ```
   }];
   let arguments = (ins AnyRankedTensor:$source,
diff --git a/mlir/test/Dialect/Linalg/invalid.mlir b/mlir/test/Dialect/Linalg/invalid.mlir
index c0c5f785e856b..fcca6dfadf723 100644
--- a/mlir/test/Dialect/Linalg/invalid.mlir
+++ b/mlir/test/Dialect/Linalg/invalid.mlir
@@ -1824,6 +1824,17 @@ func.func @unpack_invalid_outer_dims_perm(%source: tensor<128x256xf32>, %dest: t
 
 // -----
 
+// Here we have the source tensor being tiled as: `source[1] / 32` and `source[0] / 16` but the inner_dims_pos does not imply
+// a transpose of the outer dimensions for the result tensor. The tiled dimensions appear in the result tensor in the order
+// they appear in the source tensor, i.e. 16x4x32x16
+func.func @pack_invalid_result_shape(%input: tensor<256x128xf32>, %output: tensor<4x16x32x16xf32>) -> tensor<4x16x32x16xf32> {
+  // expected-error@+1 {{the shape of output is not large enough to hold the packed data. Expected at least 'tensor<16x4x32x16xf32>', got 'tensor<4x16x32x16xf32>'}}
+  %0 = linalg.pack %input inner_dims_pos = [1, 0] inner_tiles = [32, 16] into %output : tensor<256x128xf32> -> tensor<4x16x32x16xf32>
+  return %0 : tensor<4x16x32x16xf32>
+}
+
+// -----
+
 func.func @pack_invalid(%input: tensor<256x128xf32>, %output: tensor<8x8x32x16xf32>) -> tensor<8x8x32x16xf32> {
   // expected-error@+1 {{the shape of output is not large enough to hold the packed data. Expected at least 'tensor<8x8x16x32xf32>', got 'tensor<8x8x32x16xf32>'}}
   %0 = linalg.pack %input inner_dims_pos = [1, 0] inner_tiles = [16, 32] into %output : tensor<256x128xf32> -> tensor<8x8x32x16xf32>
diff --git a/mlir/test/Dialect/Linalg/named-ops.mlir b/mlir/test/Dialect/Linalg/named-ops.mlir
index 470bc1c78640c..b21b234bc7841 100644
--- a/mlir/test/Dialect/Linalg/named-ops.mlir
+++ b/mlir/test/Dialect/Linalg/named-ops.mlir
@@ -2771,6 +2771,69 @@ func.func @pad_and_pack_partially_dynamic(%source: tensor<?x?xf32>, %dest: tenso
 
 // -----
 
+func.func @pack_descending_inner_dims_with_padding(%source: tensor<1x5x7xf32>, %dest: tensor<1x3x2x4x2xf32>, %pad: f32) -> tensor<1x3x2x4x2xf32> {
+  %0 = linalg.pack %source padding_value(%pad : f32) inner_dims_pos = [2, 1] inner_tiles = [4, 2] into %dest : tensor<1x5x7xf32> -> tensor<1x3x2x4x2xf32>
+  return %0 : tensor<1x3x2x4x2xf32>
+}
+
+// CHECK-LABEL: func.func @pack_descending_inner_dims_with_padding(
+// CHECK-SAME:  %[[SOURCE:.*]]: tensor<1x5x7xf32>,
+// CHECK-SAME:  %[[DEST:.*]]: tensor<1x3x2x4x2xf32>,
+// CHECK-SAME:  %[[PAD:.*]]: f32)
+// CHECK:       %{{.*}} = linalg.pack
+// CHECK-SAME:      inner_dims_pos = [2, 1]
+// CHECK-SAME:      inner_tiles = [4, 2]
+// CHECK-SAME:      into %[[DEST]] : tensor<1x5x7xf32> -> tensor<1x3x2x4x2xf32>
+
+// -----
+
+// The function suffix "with_padding" refers to the padding that was introduced by the pack operation. But here
+// we are dropping the padding. Creating a tensor with less elements than what we started with.
+func.func @unpack_descending_inner_dims_with_padding(%source: tensor<1x3x2x4x2xf32>, %dest: tensor<1x5x7xf32>) -> tensor<1x5x7xf32> {
+  %0 = linalg.unpack %source inner_dims_pos = [2, 1] inner_tiles = [4, 2] into %dest : tensor<1x3x2x4x2xf32> -> tensor<1x5x7xf32>
+  return %0 : tensor<1x5x7xf32>
+}
+
+// CHECK-LABEL: func.func @unpack_descending_inner_dims_with_padding(
+// CHECK-SAME:  %[[SOURCE:.*]]: tensor<1x3x2x4x2xf32>,
+// CHECK-SAME:  %[[DEST:.*]]: tensor<1x5x7xf32>)
+// CHECK:       %{{.*}} = linalg.unpack
+// CHECK-SAME:      inner_dims_pos = [2, 1]
+// CHECK-SAME:      inner_tiles = [4, 2]
+// CHECK-SAME:      into %[[DEST]] : tensor<1x3x2x4x2xf32> -> tensor<1x5x7xf32>
+
+// -----
+
+func.func @pack_non_adjacent_inner_dims(%source: tensor<20x1x12xf32>, %dest: tensor<10x1x3x4x2xf32>) -> tensor<10x1x3x4x2xf32> {
+  %0 = linalg.pack %source inner_dims_pos = [2, 0] inner_tiles = [4, 2] into %dest : tensor<20x1x12xf32> -> tensor<10x1x3x4x2xf32>
+  return %0 : tensor<10x1x3x4x2xf32>
+}
+
+// CHECK-LABEL: func.func @pack_non_adjacent_inner_dims(
+// CHECK-SAME:  %[[SOURCE:.*]]: tensor<20x1x12xf32>,
+// CHECK-SAME:  %[[DEST:.*]]: tensor<10x1x3x4x2xf32>)
+// CHECK:       %{{.*}} = linalg.pack
+// CHECK-SAME:      inner_dims_pos = [2, 0]
+// CHECK-SAME:      inner_tiles = [4, 2]
+// CHECK-SAME:      into %[[DEST]] : tensor<20x1x12xf32> -> tensor<10x1x3x4x2xf32>
+
+// -----
+
+func.func @unpack_non_adjacent_inner_dims(%source: tensor<10x1x3x4x2xf32>, %dest: tensor<20x1x12xf32>) -> tensor<20x1x12xf32> {
+  %0 = linalg.unpack %source inner_dims_pos = [2, 0] inner_tiles = [4, 2] into %dest : tensor<10x1x3x4x2xf32> -> tensor<20x1x12xf32>
+  return %0 : tensor<20x1x12xf32>
+}
+
+// CHECK-LABEL: func.func @unpack_non_adjacent_inner_dims(
+// CHECK-SAME:  %[[SOURCE:.*]]: tensor<10x1x3x4x2xf32>,
+// CHECK-SAME:  %[[DEST:.*]]: tensor<20x1x12xf32>)
+// CHECK:       %{{.*}} = linalg.unpack
+// CHECK-SAME:      inner_dims_pos = [2, 0]
+// CHECK-SAME:      inner_tiles = [4, 2]
+// CHECK-SAME:      into %[[DEST]] : tensor<10x1x3x4x2xf32> -> tensor<20x1x12xf32>
+
+// -----
+
 func.func @unpack_fully_dynamic(%source: tensor<?x?x?x?xf32>, %dest: tensor<?x?xf32>, %tile_n : index, %tile_m : index) -> tensor<?x?xf32> {
   %0 = linalg.unpack %source inner_dims_pos = [0, 1] inner_tiles = [%tile_n, %tile_m] into %dest : tensor<?x?x?x?xf32> -> tensor<?x?xf32>
   return %0 : tensor<?x?xf32>

@chrsmcgrr
Copy link
Contributor Author

@banach-space here are the documentation changes we discussed. Please comment if they do not align with your expectations.

@rengolin rengolin requested review from hanhanW and adam-smnk June 12, 2025 15:01
@banach-space
Copy link
Contributor

Hey @chrsmcgrr - thanks for taking on the tricky task of improving the docs for pack and unpack. This is long overdue!

I’ve left some suggestions inline, mostly based on past experience and what (in my opinion) makes the current version a bit hard to follow - mainly too much prose and not enough code or math. That said, if others feel differently, I’m happy to iterate.

Goes without saying, this is already a massive improvement 🙏🏻

- `inner_dims_pos[i]` specifies the source tensor dimension tiled by
`inner_tiles[i]`.
`inner_tiles[i]` where `0 <= i < k`.
- the resulting tiled source dimension maps to an outer dimension of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since this is a sentence (full stop at the end), start with a capital letter.
  2. Introduce inner_tiles before this bullet point. Otherwise, this is referring to a concept yet to be introduced.
  3. This sentence is a bit complex to parse. I have no good suggestion how to improve it - the underlying logic is just very tricky. In cases like this, Maths/code/diagrams are better than words. I would remove text and use code instead: shape(result)[inner_dims_pos[i]] = shape(source)[inner_dims_pos[i]] / inner_tiles[i]. You could add: "The following relationship for the tiled dimensions holds: ...".

Comment on lines -98 to -99
- The tiled dimensions (of size `inner_tiles`) are added to the end of the result
tensor in the order in which they appear in `inner_dims_pos`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this particular sentence helpful. Would you mind keeping it? You could rewrite it using code: shape(result)[rank(result) + i] = inner_tiles[i] for 0 <= i < k.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the "order in which they appear in inner_dims_pos" is what threw me off. That is why I removed it. I will add the code example.

The order of the dimensions in `inner_dims_pos` matters: dimension
`inner_dims_pos[i]` is combined with dimension `n - k + i` (assuming that
`outer_dims_perm` is not specified).
`inner_dims_pos` (mandatory) specifies `k` result tensor dimensions that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of result vs source is getting confusing when comparing linalg.pack and linalg.unpack, so here's a small suggestion.

Suggested change
`inner_dims_pos` (mandatory) specifies `k` result tensor dimensions that
`inner_dims_pos` (mandatory) specifies `k` result tensor (i.e. unpacked tensor) dimensions that

`outer_dims_perm` is not specified).
`inner_dims_pos` (mandatory) specifies `k` result tensor dimensions that
were tiled with the `inner_tiles` to create the packed source tensor. The
source tensor dimensions can be combined given `inner_dims_pos` as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
source tensor dimensions can be combined given `inner_dims_pos` as follows:
source tensor (i.e. packed tensor) dimensions can be combined given `inner_dims_pos` as follows:

Comment on lines 283 to 285
Note, the amount of elements in the source (packed tensor) and the result
(unpacked) can be unequal, i.e. `SizeOf(source) >= SizeOf(result)`. As
the unpack operation may drop any padding introduced by the pack operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'd keep this shorter and favour code over text.
  2. Personally, I feel that it makes more sense to say that the packing op may drop elements and, as an outcome of that, the resulting tensor might be smaller than the source tensor.
  3. I suggest more explicit NumElementsOf instead of SizeOf ("size" is a bit context-sensitive).
Suggested change
Note, the amount of elements in the source (packed tensor) and the result
(unpacked) can be unequal, i.e. `SizeOf(source) >= SizeOf(result)`. As
the unpack operation may drop any padding introduced by the pack operation.
Note, the unpack operation may drop any padding introduced by the pack operation and hence the following holds `NumElementsOf(source) >= NumElementsOf(result)`.

Comment on lines 270 to 265
`inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
correspond to the least significant ("inner") source tensor dimension sizes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to delete this? That's currently repeated above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. But now, I have rephrased and shuffled the text a bit. Does that help?

Comment on lines 264 to 268
the inner tile `shape(source)[n-k+i]` is combined with
`shape(source)[inner_dims_pos[i]]` where `0 <= i < k` and stored at
`shape(result)[inner_dims_pos[i]]`. The remaining dimensions are
`shape(result)[j] = shape(source)[j]` where `0 <= j < n-k` and `j` is not in
the set of `inner_dims_pos` indices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reduce text and perhaps use code instead? For example, it is not clear what "combined" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I will re-write this.

Comment on lines 1827 to 1829
// Here we have the source tensor being tiled as: `source[1] / 32` and `source[0] / 16` but the inner_dims_pos does not imply
// a transpose of the outer dimensions for the result tensor. The tiled dimensions appear in the result tensor in the order
// they appear in the source tensor, i.e. 16x4x32x16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, isn't this example simply missing outer_dims_perm = [1, 0]? So yes, the order of the outer dims should not be permuted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example was more to reinforce what the operation does because reading from the documentation I found it difficult to follow the semantics. With the inter-play between inner_dims_pos, inner_tiles and outer_dims_perm.

I can remove the example if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the example if you like.

I do like this example, so lets keep it. My comment was more along the lines of: "Perhaps this long description could be trimmed to sth shorter referring to missing outer_dims_perm"? Here's a specific suggestion:

The outer dims in the output tensor are incorrectly/unexpectedly transposed. This could be fixed by adding `outer_dims_perm = [1, 0]` (the default value assumes no transpose).

being tiled, where `0 < k <= n`. The order of the dimensions matters:
- The tiled dimensions (of size `inner_tiles`) are added to the end of the result
tensor in the order in which they appear in `inner_dims_pos`.
being tiled, where `0 < k <= n`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found that k can equal to zero, which is a transpose op, Can you update the doc since you're touching the doc?

func.func @foo(%src: tensor<1x2x3xf32>, %dest: tensor<3x2x1xf32>) -> tensor<3x2x1xf32> {
  %0 = linalg.pack %src outer_dims_perm = [2, 1, 0] inner_dims_pos = [] inner_tiles = [] into %dest : tensor<1x2x3xf32> -> tensor<3x2x1xf32>
  return %0 : tensor<3x2x1xf32>
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pretty interesting example, maybe it deserves a test case in named-ops.mlir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use such form in data-layout propagation and decomposition tests. But yeah, we can add a test to named -ops.mlir since it is special.

https://github.com/search?q=repo%3Allvm%2Fllvm-project%20%22inner_dims_pos%20%3D%20%5B%5D%22&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the example :)

* Clarified the `inner_dim_pos` attribute in the case of high
dimensionality tensors.
* Added a 5D examples to show-case the use-cases that triggered this
updated.
* Added a reminder for linalg.unpack that number of elements are not
required to be the same between input/output due to padding being
dropped.

I encountered some odd variations of `linalg.pack` and
`linalg.unpack` while working on some TFLite models and the
definition in the documentation did not match what I saw pass in IR
verification.

The following changes reconcile those differences.

Signed-off-by: Christopher McGirr <mcgirr@roofline.ai>
Signed-off-by: Christopher McGirr <mcgirr@roofline.ai>
@chrsmcgrr chrsmcgrr force-pushed the update-linalg-pack-unpack-doc branch from ec054a3 to 565ee3e Compare June 26, 2025 14:12
@chrsmcgrr
Copy link
Contributor Author

I've added your feedback. Let me know what you think.

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, Chris!

This is much clearer now and I like the overall structure. There are some fine details that are not clear to me, see my questions inline. Could you take a look and see what you think?

Thanks!

Comment on lines 1827 to 1829
// Here we have the source tensor being tiled as: `source[1] / 32` and `source[0] / 16` but the inner_dims_pos does not imply
// a transpose of the outer dimensions for the result tensor. The tiled dimensions appear in the result tensor in the order
// they appear in the source tensor, i.e. 16x4x32x16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the example if you like.

I do like this example, so lets keep it. My comment was more along the lines of: "Perhaps this long description could be trimmed to sth shorter referring to missing outer_dims_perm"? Here's a specific suggestion:

The outer dims in the output tensor are incorrectly/unexpectedly transposed. This could be fixed by adding `outer_dims_perm = [1, 0]` (the default value assumes no transpose).

@@ -2771,6 +2771,69 @@ func.func @pad_and_pack_partially_dynamic(%source: tensor<?x?xf32>, %dest: tenso

// -----

func.func @pack_descending_inner_dims_with_padding(%source: tensor<1x5x7xf32>, %dest: tensor<1x3x2x4x2xf32>, %pad: f32) -> tensor<1x3x2x4x2xf32> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't "transposed" be more descriptive? Similar comment for the example below.

Suggested change
func.func @pack_descending_inner_dims_with_padding(%source: tensor<1x5x7xf32>, %dest: tensor<1x3x2x4x2xf32>, %pad: f32) -> tensor<1x3x2x4x2xf32> {
func.func @pack_transposed_inner_dims_with_padding(%source: tensor<1x5x7xf32>, %dest: tensor<1x3x2x4x2xf32>, %pad: f32) -> tensor<1x3x2x4x2xf32> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I will change the name.

`inner_tiles` (mandatory) specifies `k` tile sizes. These tile sizes
correspond to the least significant ("inner") result tensor dimension sizes,
in the same order. Tile sizes can be static or dynamic.

`inner_dims_pos` (mandatory) specifies `k` source tensor dimensions that are
being tiled, where `0 <= k <= n`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realised that this was incorrect. Please double-check :)

Suggested change
being tiled, where `0 <= k <= n`.
being tiled, where `0 <= k < n`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it incorrect? I think you can pack all the dimensions? E.g.,

func.func @pack_nc_to_ncnc(%source: tensor<128x256xf32>, %dest: tensor<4x16x32x16xf32>) -> tensor<128x256xf32> {
%0 = linalg.pack %source inner_dims_pos = [0, 1] inner_tiles = [32, 16] into %dest : tensor<128x256xf32> -> tensor<4x16x32x16xf32>
%1 = tensor.empty() : tensor<128x256xf32>
%2 = linalg.unpack %0 inner_dims_pos = [0, 1] inner_tiles = [32, 16] into %1 : tensor<4x16x32x16xf32> -> tensor<128x256xf32>
return %2 : tensor<128x256xf32>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can pack all the dimensions?

Yes, since there are n dimensions, the indices should go from 0 to n - 1, right? So 0 <= k < n, not 0 <= k <= n.

Hopefully not having an off-by-one moment here 😊

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "pack" operation converts a source tensor of rank `n` into a result

tensor of rank n + k with a tiled and packed layout (maybe with padding)
and optionally transposes the tiled source tensor dimensions.

If we read from the start, I think k indicates the number of dimensions to pack, which is also the number of elements in inner_dims_pos. So it is 0 <= k <= n.

You are right that the indices should go from 0 to n-1, and we may say that in the following statement like

  `inner_tiles[i]` where `0 <= i < k`. All the values in `inner_dims_pos` are within [0, n).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've incorporated the last statement @hanhanW into the docs.

packed source tensor. The source tensor (i.e. packed tensor) dimensions can
be unpacked given `inner_dims_pos` as follows.
- For `0 <= i < k` the following relationship holds:
`shape(result)[inner_dims_pos[i]] = shape(source)[n-k+i] + shape(source)[inner_dims_pos[i]]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's / in tensor.pack, then there should be * in tensor.unpack, right? Also, because of padding, it should be <= rather than =, right? Please double check :)

Suggested change
`shape(result)[inner_dims_pos[i]] = shape(source)[n-k+i] + shape(source)[inner_dims_pos[i]]`.
`shape(result)[inner_dims_pos[i]] <= shape(source)[n-k+i] * shape(source)[inner_dims_pos[i]]`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally right it should be * and I did forget about the padding case. The new equality would hold now.

Signed-off-by: Christopher McGirr <mcgirr@roofline.ai>
@chrsmcgrr
Copy link
Contributor Author

I've implemented the new feedback. Please have a look @banach-space @adam-smnk @hanhanW

@chrsmcgrr chrsmcgrr requested a review from banach-space June 27, 2025 07:21
Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for improving this, it's so much clearer now!

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@hanhanW hanhanW merged commit 29b1054 into llvm:main Jun 27, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants