New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] One hierarchical view is all you need #3988
Conversation
Changes
|
I'll believe this is real when you have all the tests passing with the new View class |
It is real. I will post the mathematics behind this with flawless logic. As for passing all tests, it's a bit tricky. Because the source code may contain "In order to achieve A we implemented B and added tests for B". If I directly solve A, then there is no need for B, identifying B can be time-consuming. |
Going to close this PR. No need to pass the view specific tests, but all models and training should work. Feel free to reopen when they do. |
I'll be moving the day after tomorrow and might not have access to a computer for a while. I'm posting the underlying principles here so that anyone interested can take a look or help accelerate the refactoring. First glanceHierarchical views are a strictly super set of normal views. For example, it is impossible for a normal view to represent the following data layout:
But can be represented with a hierarchical view:
The second dimension has two dimensions folded into it, if we slice along the second dimension we get
which is
aka the first row of the original data layout. Coordinate listsIn the following text, I will use the term "int tuple" to represent a nested tuple of integers. We say two int tuples are compatible if they have the same hierarchical structure. For example, ((2,3), 4) is compatible with ((5,6), 8). A coordinate list is a lexicographically ordered list of integers or compatible int tuples. For example,
This kind of order reflects the row-major memory layout. I will use the notation We can then define a partial order of the coordinate lists themselves by prime factorization. For example, ShapeA view contains a We can think of a view as a function that maps valid coordinates to a offset in memory. The function is defined by taking the (nested) inner product of a coordinate and the strides. For example,
But we should also be able to use a linear coordinate to index into a view.
We know that 3 is the 7th element, so if we pass in a linear coordinate 7 into the view (as a function) we should also get 3 for the output. More interestingly, if we treat the view as a 2D tensor, we know that the coordinate (1,3) will also lead us to 3. Now we introduce the following definition: A coordinate family generated by a shape
Observations: A coordinate Therefore, It turns out the is a bijective map between any two coordinate lists in It suffices to show that there is a bijective map between
The proof was provided by presenting a specific algorithm. For example, crd2idx((1,(1,0)), (2, (2,2))) = 7
idx2crd(7, (2, 4)) = (1, 3) ReshapeGiven a view idx = crd2idx(crd, S2)
new_crd = idx2crd(idx, S1)
offset = v(new_crd) If we have multiple reshape operations, we simply repeat this process. From my understanding, this is the reason you need to maintain multiple views, and additionally, a symbolic coordinate is required to record the final expression. The purpose of this PR is to completely discard this complexity. We consider
The trick is to find a new view, namely |
It's still a draft, but it's a better abstraction.
Let's see what the View in this PR can do.
This means that using symbolic to track indices in multiple views is redundant. A view records all information about the data layout.
I will add conceptual explanations later (need to go to sleep).
This PR currently has not modified any source code, it's just an addition, because the source code of
ShapeTracker
is spaghetti.