-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Update children_offsets & stride info to align as elem_stride #4345
Conversation
Currently we run 2 passes to generate snode tree layout in opengl codegen. 1. One pass from leaf to root so that we can calculate the real size & elem_stride for each snode and sort the children from small to large. 2. Another pass from root to leaf so that we can align as elem_stride. Note we cannot eliminate the first pass due to https://github.com/taichi-dev/taichi/blob/master/taichi/backends/opengl/struct_opengl.cpp#L50. The fix for the alignment issue was simple: we used children_offsets information in the codegen but didn't update it in the second pass. This PR fixes the issue and also reorganizes the code a bit to make it easier to read and understand. ghstack-source-id: 958b5ed2293af69ab1e2a8e7f734a651fab37f36 Pull Request resolved: #4345
/rebase |
bbef0e8
to
f35d8de
Compare
There seems to be a discrepancy between testing.yml and release.yml. For example, my opengl PR taichi-dev#4345 fails at vulkan backend on m1. Also today's nightly failed. So unifying them here to debug...
There seems to be a discrepancy between testing.yml and release.yml. For example, my opengl PR #4345 fails at vulkan backend on m1. Also today's nightly failed. So unifying them here to debug...
/rebase |
Currently we run 2 passes to generate snode tree layout in opengl codegen. 1. One pass from leaf to root so that we can calculate the real size & elem_stride for each snode and sort the children from small to large. 2. Another pass from root to leaf so that we can align as elem_stride. Note we cannot eliminate the first pass due to https://github.com/taichi-dev/taichi/blob/master/taichi/backends/opengl/struct_opengl.cpp#L50. The fix for the alignment issue was simple: we used children_offsets information in the codegen but didn't update it in the second pass. This PR fixes the issue and also reorganizes the code a bit to make it easier to read and understand. [ghstack-poisoned]
f35d8de
to
cf92af5
Compare
✔️ Deploy Preview for docsite-preview canceled. 🔨 Explore the source changes: cf92af5 🔍 Inspect the deploy log: https://app.netlify.com/sites/docsite-preview/deploys/621366933a524f00070f024e |
ohhhhh nice finally this passed CI ... btw what I did to get out of the infinite rerun loop was simply delete the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
if (snode.type == SNodeType::root) { | ||
snode_map_.at(snode.node_type_name).mem_offset_in_root = 0; | ||
} | ||
void OpenglStructCompiler::align_as_elem_stride(const SNode &snode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picky: is it possible to 1) extract this algo into a function (without the GL-dependent part), and 2) add a unit test for it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing! Will send a followup PR tmr or Wed !
Guess it does highlight the need for a containerized Win CI env.. |
And a containerized macos CI.... ;) |
Stack from ghstack:
Currently we run 2 passes to generate snode tree layout in opengl
codegen.
elem_stride for each snode and sort the children from small to large.
Note we cannot eliminate the first pass due to
https://github.com/taichi-dev/taichi/blob/master/taichi/backends/opengl/struct_opengl.cpp#L50.
The fix for the alignment issue was simple: we used children_offsets
information in the codegen but didn't update it in the second pass.
This PR fixes the issue and also reorganizes the code a bit to make it
easier to read and understand.