In [118]:
import cutlass
import cutlass.cute as cute

import tract
from tract.categories import Nest_morphism, NestedTuple, Tuple_morphism, Tuple
from tract.test_utils import *
from tract.layout_utils import *

# Categorical Foundations for CuTe Layouts

## Introduction

In GPU programming, performance depends critically on how data is stored and accessed in memory. While the data we care about is typically multi-dimensional, the GPU’s memory is fundamentally one-dimensional. This means that when we want to load, store, or otherwise manipulate data, we need to map its multi-dimensional logical coordinates to one-dimensional physical coordinates. This mapping, known as a layout, is essential for reading from and writing to memory correctly and efficiently. Moreover, with respect to the GPU’s SIMT execution model, layouts are used to describe and manipulate partitionings of threads over data. This is important to ensure optimized memory access patterns and correct invocation of specialized hardware instructions, such as those used to target tensor cores.

CUTLASS pioneered a novel approach to layouts that both features shape and stride tuples of arbitrary nesting and depth, and a “layout algebra” formed out of certain fundamental operations, such as composition, complementation, logical division and logical product. These CuTe layouts are incredibly expressive, allowing one to describe partitioning patterns for all generations of tensor core instructions, for example. They are also fascinating from a mathematical perspective, since they feature an unusual and subtle notion of function composition that demands a theoretical explanation.

In a new paper, we develop a robust mathematical theory underlying this approach, connecting CuTe layouts and their algebra to the theory of categories and operads and developing a new graphical calculus of layout diagrams for computing their operations. A pdf copy of our paper is linkedin the companion blog post, which may be found [here](https://research.colfax-intl.com/categorical-foundations-for-cute-layouts/).

In this notebook, we illustrate how to use the module tract to work with morphisms in the categories **Tuple** and **Nest**. 

## Constructing tuples and nested tuples

In [119]:
S = (2,2,2)
T = ((2,2),(5,5))
U = ((2,2),4,(9,(3,3)))
print("S =", S)
print("T =", T)
print("U =", U)

S = (2, 2, 2)
T = ((2, 2), (5, 5))
U = ((2, 2), 4, (9, (3, 3)))


To create tuples of length 1, we must include a trailing comma:

In [120]:
S = (10,)
T = (10)
print("S =", S)
print("T =", T)

S = (10,)
T = 10


## Constructing layouts and morphisms

We construct layouts $L = S : D$ in `cute` as `L = cute.make_layout(shape=S, stride=D)`, as seen in these examples.

In [121]:
@cute.jit
def construct_example_layouts():
    A = cute.make_layout(shape=((4,4),4), stride=((16,1),4))
    B = cute.make_layout(shape=(8,64), stride=(64,1))
    C = cute.make_layout(shape=100, stride=2)
    print("A =", A)
    print("B =", B)
    print("C =", C)
construct_example_layouts()

A = ((4,4),4):((16,1),4)
B = (8,64):(64,1)
C = 100:2


We construct a nested tuple morphism 
$$
f: S \to T
$$
lying over $\alpha$
in `tract` by specifying `f = tract.make_morphism(domain=S, codomain=T, map=alpha)`

In [122]:
f = tract.make_morphism(domain=(4,4), codomain=(4,2,4), map=(1,3))
g = tract.make_morphism(domain=(2,2,2,2), codomain=(2,2,2,2), map=(1,0,4,2))
h = tract.make_morphism(domain=(16,(4,4),(4,4)), codomain=(16,4,4), map=(1,2,0,3,0))
k = tract.random_Nest_morphism(min_value = 2, max_value = 16, max_length = 4)
print(f"{'f':<3}= {f}")
print(f"{'g':<3}= {g}")
print(f"{'h':<3}= {h}")
print(f"{'k':<3}= {k}")

f  = (4,4) --(1, 3)--> (4,2,4)
g  = (2,2,2,2) --(1, 0, 4, 2)--> (2,2,2,2)
h  = (16,(4,4),(4,4)) --(1, 2, 0, 3, 0)--> (16,4,4)
k  = 6 --(1,)--> 6


## Translating between tractable layouts and morphisms

We define an ordering $\preceq$ on integer pairs $s:d$ by 

$$s:d \preceq s':d' \quad \Leftrightarrow \quad d<d'\text{ or } d = d' \text{ and } s \leq s'$$
Suppose $L$ is a layout, and write $L^\flat = (s_1,\dots,s_m):(d_1,\dots,d_m)$ for its flattening. We say $L$ is **tractable** if for any $1 \leq i,j \leq  m$, the following condition holds:
$$\text{ if }s_i:d_i \preceq s_j,d_j\text{ and }d_i,d_j \neq 0\text{, then } s_id_i \text{ divides }d_j.$$
We can check if $L$ is tractable with `tract.is_tractable(L)`. 

In [123]:
@cute.jit
def test_is_tractable():
    A = cute.make_layout(shape=(2,2,2), stride=(1,2,4))
    B = cute.make_layout(shape=(2,2,2), stride=(1,7,4))
    A_is_tractable = tract.is_tractable(A)
    B_is_tractable = tract.is_tractable(B)
    print(f"A =", A)
    print(f"A is tractable: {A_is_tractable}")
    print(f"B =", B)
    print(f"B is tractable: {B_is_tractable}")
test_is_tractable()

A = (2,2,2):(1,2,4)
A is tractable: True
B = (2,2,2):(1,7,4)
B is tractable: False


If $L$ is a tractable layout, then we can construct the standard representation $f_L$ with `tract.compute_morphism(L)`:

In [124]:
@cute.jit
def construct_standard_representation():
    L = cute.make_layout(shape=(2,2,2), stride=(1,2,4))
    f_L = tract.compute_morphism(L)
    print(f"{'L':<5}= {L}")
    print(f"{'f_L':<5}= {f_L}")
construct_standard_representation()

L    = (2,2,2):(1,2,4)
f_L  = (2,2,2) --(1, 2, 3)--> (2,2,2)


If $f$ is a tuple morphism, we can construct the layout $L_f$ encoded by $f$ with `tract.compute_layout(f)`.

In [125]:
@cute.jit
def compute_layout_example():
    f = tract.make_morphism(domain=((5,5),8), codomain=(5,8,5), map=(1,3,2))
    L_f = tract.compute_layout(f)
    print(f"{'f':<5}= {f}")
    print(f"{'L_f':<5}= {L_f}")
    g = tract.random_Nest_morphism()
    L_g = tract.compute_layout(g)
    print(f"{'g':<5}= {g}")
    print(f"{'L_g':<5}= {L_g}")
compute_layout_example()

f    = ((5,5),8) --(1, 3, 2)--> (5,8,5)
L_f  = ((5,5),8):((1,40),5)
g    = 39 --(3,)--> (48,(118,39),((((112,67)),1),(1,1)))
L_g  = 39:5664


## Composition

When defined, this operation produces a layout $B \circ A$ from a pair of layouts $A$ and $B$. We can compute the composition $B \circ A$ in `cute` as `cute.composition(B, A)`:

In [126]:
@cute.jit
def composition_example():
    A = cute.make_layout(shape=((4,4),4), stride=((16,1),4))
    B = cute.make_layout(shape=(8,64), stride=(64,1))
    B_o_A = cute.composition(B,A)
    print(f"{'A':<5}= {A}")
    print(f"{'B':<5}= {B}")
    print(f"{'B∘A':<5}= {B_o_A}")
composition_example()

A    = ((4,4),4):((16,1),4)
B    = (8,64):(64,1)
B∘A  = ((4,4),(2,2)):((2,64),(256,1))


If $f$ and $g$ are composable nested tuple morphisms, we can compute the composition $g \circ f$ in `tract` as `f.compose(g)`:

In [127]:
f = tract.make_morphism(domain=((2,2),(2,2)), codomain=((2,2,2),(2,2,2)), map=(3,2,6,5))
g = tract.make_morphism(domain=((2,2,2),(2,2,2)), codomain=(2,2,2,2), map=(1,0,2,0,3,4))
g_o_f = tract.compose(f, g)
print(f"{'f':<5}= {f}")
print(f"{'g':<5}= {g}")
print(f"{'g∘f':<5}= {g_o_f}")

f    = ((2,2),(2,2)) --(3, 2, 6, 5)--> ((2,2,2),(2,2,2))
g    = ((2,2,2),(2,2,2)) --(1, 0, 2, 0, 3, 4)--> (2,2,2,2)
g∘f  = ((2,2),(2,2)) --(2, 0, 4, 3)--> (2,2,2,2)


Composition of nested tuple morphisms is compatible with layout composition, as illustrated below.

In [128]:
@cute.jit
def composition_compatibility_example():
    f,g = tract.random_composable_Nest_morphisms(min_length=0, max_length=6, max_value=64)
    A = tract.compute_layout(f)
    B = tract.compute_layout(g)
    B_o_A = cute.composition(B,A)

    g_o_f = tract.compose(f, g)
    C = tract.compute_layout(g_o_f)
    print(f"{'B∘A':<5}= {B_o_A}")
    print(f"{'C':<5}= {C}")
composition_compatibility_example()


B∘A  = 30:0
C    = 30:0


## Coalesce

This operation produces a layout $\text{coal}(A)$ from $A$ that is in a precise sense of *minimal complexity*. We can compute $\text{coal}(A)$ in `cute` with `cute.coalesce(A)`.

In [129]:
@cute.jit 
def coalesce_example():
     A = cute.make_layout(shape = ((2,2),(2,2),(5,5)), stride = ((1,2),(16,32),(64,640)))
     coal_A = cute.coalesce(A)
     print(f"{'A':<7}= {A}")
     print(f"{'coal_A':<7}= {coal_A}")
coalesce_example()

A      = ((2,2),(2,2),(5,5)):((1,2),(16,32),(64,640))
coal_A = (4,20,5):(1,16,640)


There is also a **relative coalesce** operation (sometimes called **by-mode coalesce**) $A \mapsto \text{coal}(A, S)$, which receives as input an additional nested tuple $S$ which is *refined* by the shape of $A$. We can compute $\text{coal}(A, S)$ in `cute` with `cute.coalesce(A, target_profile=S)`. 

In [130]:
@cute.jit
def relative_coalesce_example():
     A = cute.make_layout(shape = ((2,2),(3,3),(5,5)), stride = ((1,2),(4,12),(36,180)))
     S = ((2,2),9,25)
     coal_A_over_S = cute.coalesce(A,target_profile=S)
     print(f"{'A':<15}= {A}")
     print(f"{'S':<15}= {S}")
     print(f"{'coal_A_over_S':<15}= {coal_A_over_S}")
relative_coalesce_example()

A              = ((2,2),(3,3),(5,5)):((1,2),(4,12),(36,180))
S              = ((2, 2), 9, 25)
coal_A_over_S  = ((2,2),9,25):((1,2),4,36)


If $f$ is a nested tuple morphism, we may form $\text{coal}(f)$. We compute $\text{coal}(f)$ in `tract` as `tract.coalesce(f)`. 

In [131]:
f = tract.make_morphism(domain=(2,2,10,10), codomain=(2,2,2,10,10), map=(1,2,4,5))
coal_f = f.coalesce()
print(f"{'f':<7}= {f}")
print(f"{'coal_f':<7}= {coal_f}")

f      = (2,2,10,10) --(1, 2, 4, 5)--> (2,2,2,10,10)
coal_f = (4,100) --(1, 3)--> (4,2,100)


Coalesce of nested tuple morphisms is compatible with layout coalesce, as illustrated below.

In [132]:
@cute.jit
def coalesce_compatibility_example():
    f = random_Nest_morphism(max_length = 5, max_value = 32)
    coal_f = f.coalesce()
    A = tract.compute_layout(f)
    coal_A = cute.coalesce(A)
    B = tract.compute_layout(coal_f)
    print(f"{'coal_A':<7}= {coal_A}")
    print(f"{'B':<7}= {B}")
coalesce_compatibility_example()

coal_A = 783:1
B      = 783:1


Note that, as discussed in our paper, there is one case where the CuTe implementation of coalesce does not coincide with our definition of coalesce, namely when the layout in question is the empty layout `E = ():()`. In this case, `cute.coalesce(():())` returns `():()`, while our definition of coalesce yields `coal(():()) = 1:0`. This is no real obstacle, since these layouts have identical layout functions. 

## Complement

When defined, this operation produces a layout $\text{comp}(A, N)$ from a layout $A$ and a positive integer $N$. We can compute $\text{comp}(A, N)$ in `cute` with `cute.complement(A, N)`:

In [133]:
@cute.jit
def complement_example():
     A = cute.make_layout(shape = ((2,2),(2,2)), stride = ((8,2),(64,256)))
     comp_A = cute.complement(A,4096)
     print(f"{'A':<7}= {A}")
     print(f"{'comp_A':<7}= {comp_A}")
complement_example()

A      = ((2,2),(2,2)):((8,2),(64,256))
comp_A = (2,2,4,2,8):(1,4,16,128,512)


If $f$ is a nested tuple morphism, we may form the complement $f^c$ of $f$. We compute $f^c$ in `tract` as `tract.complement(f)`. 

In [134]:
f = tract.make_morphism(domain=(2,2), codomain=(2,5,2,5), map=(1,3))
comp_f = f.complement()
print(f"{'f':<7}= {f}")
print(f"{'comp_f':<7}= {comp_f}")

f      = (2,2) --(1, 3)--> (2,5,2,5)
comp_f = (5,5) --(2, 4)--> (2,5,2,5)


Complements of nested tuple morphisms are compatible with layout complements, as illustrated below.

In [135]:
@cute.jit
def complement_compatibility_example():
    f = random_complementable_Nest_morphism()
    N = f.codomain.size()
    comp_f = f.complement().coalesce()
    A = tract.compute_layout(f)
    comp_A = cute.complement(A,N)
    B = tract.compute_layout(comp_f)
    print(f"{'comp_A':<7}= {comp_A}")
    print(f"{'B':<7}= {B}")
complement_compatibility_example()

comp_A = 82944:1
B      = 82944:1


## Logical Division

When defined, this operation produces a layout $A \oslash B$ from a pair of layouts $A$ and $B$. We compute $A \oslash B$ in `cute` with `cute.logical_divide(A, B)`.

In [136]:
@cute.jit
def logical_divide_example():
    A = cute.make_layout((64,32), stride = (32,1))
    B = cute.make_layout((4,4), stride = (1,64))
    quotient = cute.logical_divide(A,B)
    print(f"{'A':<9}= {A}")
    print(f"{'B':<9}= {B}")
    print(f"{'quotient':<9}= {quotient}")
logical_divide_example()

A        = (64,32):(32,1)
B        = (4,4):(1,64)
quotient = ((4,4),(16,8)):((32,1),(128,4))


If $f$ and $g$ are nested tuple morphisms and $g$ divides $f$, then we may form the logical division $f \oslash g$. We compute $f \oslash g$ in `tract` with `tract.logical_divide(f, g)`. 

In [137]:
f = tract.make_morphism(domain=(4, 8, 4, 8), codomain=(4, 8, 4, 8), map=(1, 2, 3, 4))
g = tract.make_morphism(domain=(4, 4), codomain=(4, 8, 4, 8), map=(1, 3))
quotient = tract.logical_divide(f, g)
print(f"{'f':<10}= {f}")
print(f"{'g':<10}= {g}")
print(f"{'quotient':<10}= {quotient}")

f         = (4,8,4,8) --(1, 2, 3, 4)--> (4,8,4,8)
g         = (4,4) --(1, 3)--> (4,8,4,8)
quotient  = ((4,4),(8,8)) --(1, 3, 2, 4)--> (4,8,4,8)


Logical division of nested tuple morphisms is compatible with logical division of layouts, as illustrated below. 

In [138]:
@cute.jit
def logical_division_compatibility_example():
    f,g = random_divisible_Nest_morphisms()
    A = tract.compute_layout(f)
    B = tract.compute_layout(g)
    A_div_B = cute.coalesce(cute.composition(A,concatenate(B,cute.complement(B,cute.size(A)))))
    f_div_g = tract.logical_divide(f, g)
    C = cute.coalesce(tract.compute_layout(f_div_g))
    print(f"{'A_div_B':<8}= {A_div_B}")
    print(f"{'C':<8}= {C}")
logical_division_compatibility_example()

A_div_B = (6,7):(0,9)
C       = (6,7):(0,9)


## Logical Product

When defined, this operation produces a layout $A \otimes B$ from a pair of layouts $A$ and $B$. We compute $A \otimes B$ in `cute` as `cute.logical_product(A, B)`. 

In [139]:
@cute.jit
def logical_product_example():
    A = cute.make_layout((3,10,10), stride = (200,1,20))
    B = cute.make_layout((2,2), stride = (1,2))
    product = cute.logical_product(A,B)
    print(f"{'A':<9}= {A}")
    print(f"{'B':<9}= {B}")
    print(f"{'product':<9}= {product}")
logical_product_example()

A        = (3,10,10):(200,1,20)
B        = (2,2):(1,2)
product  = ((3,10,10),(2,2)):((200,1,20),(10,600))


If $f$ and $g$ are nested tuple morphisms that are product-admissible, we may form the logical product $f \otimes g$. We compute $f \otimes g$ in `tract` with `tract.logical_product(f, g)`. 

In [140]:
f = tract.make_morphism(domain=(2, 2), codomain=(2, 2, 5, 5), map=(1, 2))
g = tract.make_morphism(domain=(5,5), codomain=(5,5), map=(2,1))
product = tract.logical_product(f, g)
print(f"{'f':<10}= {f}")
print(f"{'g':<10}= {g}")
print(f"{'product':<10}= {product}")

f         = (2,2) --(1, 2)--> (2,2,5,5)
g         = (5,5) --(2, 1)--> (5,5)
product   = ((2,2),(5,5)) --(1, 2, 4, 3)--> (2,2,5,5)


The logical product of nested tuple morphisms is compatible with the logical product of layouts, as illustrated below.

In [141]:
@cute.jit
def logical_product_compatibility_example():
    f,g = random_product_admissible_Nest_morphisms(min_value = 2, max_value = 32, max_length = 5)
    A = tract.compute_layout(f)
    B = tract.compute_layout(g)
    A_prod_B = cute.logical_product(A,B)

    f_prod_g = tract.logical_product(f, g)
    C = tract.compute_layout(f_prod_g)
    print(f"{'A_prod_B':<9}= {A_prod_B}")
    print(f"{'C':<9}= {C}")
logical_product_compatibility_example()

A_prod_B = (30,19):(1,0)
C        = (30,19):(1,0)


## Printing Nested Tuple Diagrams with TikZ

We provide a utility to convert a nested tuple morphism $f$ to TikZ code that can be compiled with any TeX editor.

In [142]:
f = tract.make_morphism(domain=(2, 2), codomain=(2, 2, 5, 5), map=(1, 2))
g = tract.make_morphism(domain=(5,5), codomain=(5,5), map=(2,0))
product = tract.logical_product(f, g)

print(tract.morphism_to_tikz(product)) # equivalently, product.to_tikz()


\begin{tikzpicture}[
    entry/.style={minimum width=5mm, minimum height=7mm, inner sep = 2pt},
    maparrow/.style={|->}
]
\def\colspacing{3}
\def\rowspacing{0.8}

\node[entry] (s1) at (2.20, 0.00) {2};
\node[entry] (s2) at (2.20, 0.80) {2};
\node[entry] (s3) at (2.20, 1.60) {5};
\node[entry] (s4) at (2.20, 2.40) {5};

\node[entry] (m1) at (0.00, 0.00) {4};
\node[entry] (m2) at (0.00, 0.80) {25};

% Trees
\coordinate (j0) at (1.10, 0.40);
\draw (j0) -- (m1.east);
\draw (s1.west) -- (j0);
\draw (s2.west) -- (j0);
\coordinate (j1) at (1.10, 2.00);
\draw (j1) -- (m2.east);
\draw (s3.west) -- (j1);
\draw (s4.west) -- (j1);

\node[entry] (t1) at (5.20, 0.00) {2};
\node[entry] (t2) at (5.20, 0.80) {2};
\node[entry] (t3) at (5.20, 1.60) {5};
\node[entry] (t4) at (5.20, 2.40) {5};

\mapArrow{s1}{t1};
\mapArrow{s2}{t2};
\mapArrow{s3}{t4};
\node at (2.60, -0.8) {$((2,2),(5,5)) \xrightarrow{(1,2,4,*)} (2,2,5,5)$};

\end{tikzpicture}



Compiling this code gives the figure

![nest_morphism_example](images/nest_morphism_to_tikz_example.png)

We also provide the ability to take a `CuTe` layout and produce such a diagram, labeled by the layout. 

In [143]:
@cute.jit
def tikz_layout_test():
    L = cute.make_layout(shape=((4, 8, 4), (2, 2, 8)),
    stride=((128, 1, 16), (64, 8, 512)))
    print(tract.layout_to_tikz(L, full_doc=True))
tikz_layout_test()


\documentclass{standalone}
\usepackage{tikz}
\usepackage{amsmath}
\usetikzlibrary{arrows.meta, positioning}
\newcommand{\mapArrow}[2]{\draw[maparrow] (#1.east) -- (#2.west);}
\begin{document}


\begin{tikzpicture}[
    entry/.style={minimum width=5mm, minimum height=7mm, inner sep = 2pt},
    maparrow/.style={|->}
]
\def\colspacing{3}
\def\rowspacing{0.8}

\node[entry] (s1) at (2.20, 0.00) {4};
\node[entry] (s2) at (2.20, 0.80) {8};
\node[entry] (s3) at (2.20, 1.60) {4};
\node[entry] (s4) at (2.20, 2.40) {2};
\node[entry] (s5) at (2.20, 3.20) {2};
\node[entry] (s6) at (2.20, 4.00) {8};

\node[entry] (m1) at (0.00, 0.00) {128};
\node[entry] (m2) at (0.00, 0.80) {32};

% Trees
\coordinate (j0) at (1.10, 0.80);
\draw (j0) -- (m1.east);
\draw (s1.west) -- (j0);
\draw (s2.west) -- (j0);
\draw (s3.west) -- (j0);
\coordinate (j1) at (1.10, 3.20);
\draw (j1) -- (m2.east);
\draw (s4.west) -- (j1);
\draw (s5.west) -- (j1);
\draw (s6.west) -- (j1);

\node[entry] (t1) at (5.20, 0.00) {8};
\node[e

Compiling this code gives the following output:

![layout to tikz example](images/layout_to_tikz_example.png)

We can also produce a diagram representing a mutual refinement.

In [144]:
@cute.jit
def tikz_mutual_refinement_test():
    MR1 = tract.mutual_refinement_to_tikz(NestedTuple((6, 6)), NestedTuple((12, 3, 6)))
    MR2 = tract.mutual_refinement_to_tikz(NestedTuple((5, 6, 2, 6)), NestedTuple((10, 360)))
    print(MR1)
    print(MR2)
tikz_mutual_refinement_test()


\begin{tikzpicture}[
    entry/.style={minimum width=5mm, minimum height=7mm, inner sep = 2pt},
    maparrow/.style={|->}
]
\def\colspacing{3}
\def\rowspacing{0.8}

\node[entry] (s1) at (0.000,0.000) {6};
\node[entry] (s2) at (0.000,0.800) {2};
\node[entry] (s3) at (0.000,1.600) {3};
\node[entry] (s4) at (0.000,2.400) {6};

% Left-side roots and tree for P
\node[entry] (Lm1) at (-2.500,0.000) {6};
\draw (s1.west) -- (Lm1.east);
\node[entry] (Lm2) at (-2.500,0.800) {6};
\coordinate (Lj0) at (-1.250,1.200);
\draw (s2.west) -- (Lj0);
\draw (s3.west) -- (Lj0);
\draw (Lj0) -- (Lm2.east);

% Right-side roots and tree for Q
\node[entry] (Rm1) at (2.500,0.000) {12};
\coordinate (Rj0) at (1.250,0.400);
\draw (s1.east) -- (Rj0);
\draw (s2.east) -- (Rj0);
\draw (Rj0) -- (Rm1.west);
\node[entry] (Rm2) at (2.500,0.800) {3};
\draw (s3.east) -- (Rm2.west);
\node[entry] (Rm3) at (2.500,1.600) {6};
\draw (s4.east) -- (Rm3.west);

\node at (0.000, -0.8) {$(6,(2,3)) \quad ((6,2),3,6)$};

\end{tikzpictur

Compiling this code gives the following pair of diagrams:

![mutual refiinement example](images/mutual_refinement_tikz_example.png)