Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] One hierarchical view is all you need #3988

Closed
wants to merge 3 commits into from

Conversation

YichengDWu
Copy link

@YichengDWu YichengDWu commented Mar 29, 2024

It's still a draft, but it's a better abstraction.

Let's see what the View in this PR can do.

from tinygrad.shape.view2 import View

a = View.create((10,10)); print(a); print(a.render());
# View(shape=(10, 10), strides=(10, 1), continuous=True)
# ((idx0*10)+idx1)

a = a.permute((1,0)); print(a); print(a.render());
# View(shape=(10, 10), strides=(1, 10), continuous=False)
# ((idx1*10)+idx0)

a = a.reshape((5,2,5,2)); print(a); print(a.render());
# View(shape=(5, 2, 5, 2), strides=(2, 1, 20, 10), continuous=False)
# ((idx0*2)+(idx2*20)+(idx3*10)+idx1)

a = a.reshape((100,)); print(a); print(a.render());
# View(shape=((10, 10),), strides=((1, 10),), continuous=False)
# (((idx0%10)*10)+(idx0//10))

a = a.permute((1,0)); print(a); print(a.render());
# View(shape=(10, 10), strides=(10, 1), continuous=True)
# ((idx0*10)+idx1

assert a.continuous == True

This means that using symbolic to track indices in multiple views is redundant. A view records all information about the data layout.

I will add conceptual explanations later (need to go to sleep).

This PR currently has not modified any source code, it's just an addition, because the source code of ShapeTracker is spaghetti.

@YichengDWu YichengDWu marked this pull request as draft March 29, 2024 06:43
Copy link
Contributor

Changes

Name                           Lines    Diff    Tokens/Line    Diff
---------------------------  -------  ------  -------------  ------
tinygrad/shape/view2.py          101    +101            9.9    +9.9
tinygrad/shape/int_tuple.py       86     +86           10.2   +10.2


total lines changes: +187

@geohot
Copy link
Collaborator

geohot commented Mar 29, 2024

I'll believe this is real when you have all the tests passing with the new View class

@YichengDWu
Copy link
Author

It is real. I will post the mathematics behind this with flawless logic.

As for passing all tests, it's a bit tricky. Because the source code may contain "In order to achieve A we implemented B and added tests for B". If I directly solve A, then there is no need for B, identifying B can be time-consuming.

@geohot
Copy link
Collaborator

geohot commented Mar 29, 2024

Going to close this PR. No need to pass the view specific tests, but all models and training should work. Feel free to reopen when they do.

@geohot geohot closed this Mar 29, 2024
@YichengDWu
Copy link
Author

YichengDWu commented Mar 29, 2024

I'll be moving the day after tomorrow and might not have access to a computer for a while. I'm posting the underlying principles here so that anyone interested can take a look or help accelerate the refactoring.

First glance

 Hierarchical views are a strictly super set of normal views. For example, it is impossible for a normal view to represent the following data layout:

0 4 1 5
2 6 3 7

But can be represented with a hierarchical view:

shape:   (2, (2, 2))
strides: (2, (1, 4))

The second dimension has two dimensions folded into it, if we slice along the second dimension we get

shape:  (2, 2)
strides: (1, 4)

which is

0 4 
1 5

aka the first row of the original data layout.

Coordinate lists

In the following text, I will use the term "int tuple" to represent a nested tuple of integers.

We say two int tuples are compatible if they have the same hierarchical structure. For example, ((2,3), 4) is compatible with ((5,6), 8).

A coordinate list is a lexicographically ordered list of integers or compatible int tuples. For example,

{
 (0, 0), (0, 1), (0, 2), (0, 3)
 (1, 0), (1, 1), (1, 2), (1,3)
}
{
 (0, (0, 0)), (0, (0, 1)), (0, (1, 0)), (0, (1, 1))
 (1, (0, 0)), (1, (0, 1)), (1, (1, 0)), (1, (1, 1))
}

This kind of order reflects the row-major memory layout.

I will use the notation $C_{(2,4)}$ for the first coordinate list above and $C_{(2,(2,2))}$ for the second one. The meaning of this notation is self-evident.

We can then define a partial order of the coordinate lists themselves by prime factorization. For example, $C_{(2,4)}\leq C_{(2,(2,2))}$ since 4 can be factored into 2 by 2. And

$$ C_{36}\leq C_{(6,6)} \leq C_{((2,3), 6)}\leq C_{((2,3), (2,3))} $$

$$ C_{36}\leq C_{(4,9)} \leq C_{(4, (3,3))}\leq C_{(2,2),(3,3)} $$

Shape

A view contains a shape and a strides (I hate using the plural). They are both int tuples.

We can think of a view as a function that maps valid coordinates to a offset in memory.

The function is defined by taking the (nested) inner product of a coordinate and the strides.

For example,

shape:   (2, (2, 2))
strides: (2, (1, 4))
coordinate: (1, (1, 0))
offset: <(1, (1, 0)), (2, (1, 4))> -> 3

But we should also be able to use a linear coordinate to index into a view.

0 4 1 5
2 6 3 7

We know that 3 is the 7th element, so if we pass in a linear coordinate 7 into the view (as a function) we should also get 3 for the output.

More interestingly, if we treat the view as a 2D tensor, we know that the coordinate (1,3) will also lead us to 3.

Now we introduce the following definition:

A coordinate family generated by a shape $S$ is a set of coordinate lists defined by

$$G(S)={C_P|\ C_P\leq C_S}$$
For example,

$$ S=(2,(2,2)), \quad G(S)={C_{8},C_{(2,4)}, C_{(2,(2,2))}} $$

Observations: $C_S\in G(S)$ and $C_{|S|}\in G(S)$.

A coordinate $c$ of a shape $S$ is an element of an element $C_P \in G(S)$.

Therefore, (1, (1, 0)), (1, 3) and 7 are all valid coordinates of the shape (2,(2,2)).

It turns out the is a bijective map between any two coordinate lists in $G(S)$.

It suffices to show that there is a bijective map between $C_{|S|}$ and any $C_P\in G(S)$, then

$$C_{P_1}\leftrightarrow C_{|S|} \leftrightarrow C_{P_2}$$
for any $C_{P_1}\leftrightarrow C_{P_2}$.

The proof was provided by presenting a specific algorithm. crd2idx takes you from $C_{P_1}$ to $C_{|S|}$ and idx2crd takes you from $C_{|S|}$ to $C_{P_2}$.

For example,

crd2idx((1,(1,0)), (2, (2,2))) = 7
idx2crd(7, (2, 4)) = (1, 3)

Reshape

Given a view $v$ with shape $S_1$ and stride $D_1$, reshaping it to a new shape $S_2$ means for a coordinate $crd\in C_{S_2}$, we do

idx = crd2idx(crd, S2)
new_crd = idx2crd(idx, S1)
offset = v(new_crd)

If we have multiple reshape operations, we simply repeat this process.

From my understanding, this is the reason you need to maintain multiple views, and additionally, a symbolic coordinate is required to record the final expression.

The purpose of this PR is to completely discard this complexity.

We consider reshape as function composition.

$$v\circ v_2(\text{crd})=v(v_2(crd))=\text{offset}$$
Here $v_2$ is defined to be a continuous view with shape $S_2$. Then, the computation process of this composition is completely consistent with the above algorithm.

The trick is to find a new view, namely $v\circ v_2$, then we can simply maintain the new view and discard $v$ and $v_2$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants