New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenGL] dynamic range for-loop support #785
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits..
if (ker->rse.has_value()) ker->num_groups = (*ker->rse)((const char *)gtmp_base); | ||
TI_DEBUG("kernel [{}] num_groups = {}", ker->kernel_name, ker->num_groups); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specify dynamic work group size for dyn-range-kernels (thanks to the GL feature enable me).
Is this better than grid-stride-loop?
@k-ye @yuanming-hu Does other backends enables us to dynamically change work group size? If so we may make use of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For metal, I don't think so .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CUDA, I guess the answer is no. A seemingly related feature is https://devblogs.nvidia.com/cuda-dynamic-parallelism-api-principles/
My feeling is grid-strided loops are usually good enough.
TODO: remove unneeded check_opengl_error's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Most of my comments are nits or TODOs, and i think we're very close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM! (with two more nits..)
Thanks, @archibate, and @k-ye. Sorry about not participating in this PR activity - I got quite occupied by my research project this week. I'm merging this in now. |
Related issue = #... (if any)
[Click here for the format server]