-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vulkan] Add buffer device address (physical pointers) support & other improvements #4221
Conversation
✔️ Deploy Preview for docsite-preview ready! 🔨 Explore the source changes: 79a4ef6 🔍 Inspect the deploy log: https://app.netlify.com/sites/docsite-preview/deploys/62030421737f9200075336cb 😎 Browse the preview: https://deploy-preview-4221--docsite-preview.netlify.app |
/format |
/format |
if (buffer.type == BufferType::Args) { | ||
buffer_binding_map_[key] = 0; | ||
buffer_value_map_[key] = args_buffer_value_; | ||
return args_buffer_value_; | ||
} | ||
|
||
if (buffer.type == BufferType::Rets) { | ||
buffer_binding_map_[key] = 1; | ||
buffer_value_map_[key] = ret_buffer_value_; | ||
return ret_buffer_value_; | ||
} | ||
|
||
int binding = binding_head_++; | ||
buffer_binding_map_[key] = binding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand whats happening here. If we return early for "Args" and "Rets", doesn't that mean buffer_argument
won't be called for these two buffers? E.g. they would miss the "Block" decoration. This was also the case in #4205, and I've been getting complaints from tint because of that.
Also, wouldn't this cause Args and Rets to be bound to the same binding point as other buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe the "Block" decoration should be added to buffer_struct_argument
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing about #4205 and this PR that's a bit problematic: In 4205, compile_args_struct()
gets called regardless of whether or not there're any arguments. You fixed that in this PR by returning immediately if num args = 0, but not every offloaded task in the kernel is guaranteed to use those args. So you could end with a offloadedTask/shader that declares the args buffer as an input, but never use it. This might be fine with Vulkan (i'm not sure), but it's causing a lot of trouble for webgpu stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call "compile_args" only when we notice a ArgLoadStmt
. Same for return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoration is added when calling "compile_args_struct" where it calls the “buffer_struct_argument" or "uniform_struct_argument"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it in run
rn, afterwards we will have a prepass that scans all the required features / structures, and then we can clean up all the initialization code (instead of the one pass approach we are using rn). If we want to move it to visit(ArgLoadStmt)
rn we will need an if statement to avoid generating multiple parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And an if statement is bad because?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not an elegant solution either. Proper fix will happen when we have a prepass featuer scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not that ugly either. we know that a visit(ArgLoad) exists iff you need the arg structs. so please move it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think that's the right way. We shouldn't scatter global section initialization and type definitions across all the statements. We will add a prepass soon once the Metal random seed PR is merged.
Other than the previously mentioned stuff, LGTM. |
/format |
… into bobcao3/vulkan-more
/format |
/format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
If we are using uniform buffer for args, how do we handle writing to an external array? Or is the external arrays implemented as Ndarray already?
/format |
It's the same as NDarray (truely any_arr) |
… into bobcao3/vulkan-more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM!
Related issue = #3807
Buffer device address is a unique Vulkan capability of directly addressing the memory using a pointer, and the ability to cast pointers to uint64 (i.e. generic pointers)
Codegen is modified so that ext arrays and ndarrays can be addressed directly instead of using bindings. This also removes an shifting because the pointers are byte addressed. This also means aliasing is not needed if buffer device address is supported.
This change moved the args to a uniform buffer which is loaded and cached in a separate specialized constants buffer on the GPU. The constants buffer supports broadcasting (i.e. a single scalar load is issued and results are broadcasted to all lanes in the wave) This also allows tools like spirv-opt to optimize code involving args (e.g. index calculation) as uniform buffer is known to be constant during execution.