Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vulkan] Add buffer device address (physical pointers) support & other improvements #4221

Merged
merged 22 commits into from
Feb 9, 2022

Conversation

bobcao3
Copy link
Collaborator

@bobcao3 bobcao3 commented Feb 7, 2022

Related issue = #3807

  1. Buffer device address

Buffer device address is a unique Vulkan capability of directly addressing the memory using a pointer, and the ability to cast pointers to uint64 (i.e. generic pointers)

Codegen is modified so that ext arrays and ndarrays can be addressed directly instead of using bindings. This also removes an shifting because the pointers are byte addressed. This also means aliasing is not needed if buffer device address is supported.

  1. Uniform buffer / constants buffer args

This change moved the args to a uniform buffer which is loaded and cached in a separate specialized constants buffer on the GPU. The constants buffer supports broadcasting (i.e. a single scalar load is issued and results are broadcasted to all lanes in the wave) This also allows tools like spirv-opt to optimize code involving args (e.g. index calculation) as uniform buffer is known to be constant during execution.

@netlify
Copy link

netlify bot commented Feb 7, 2022

✔️ Deploy Preview for docsite-preview ready!

🔨 Explore the source changes: 79a4ef6

🔍 Inspect the deploy log: https://app.netlify.com/sites/docsite-preview/deploys/62030421737f9200075336cb

😎 Browse the preview: https://deploy-preview-4221--docsite-preview.netlify.app

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 7, 2022

/format

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 7, 2022

/format

Comment on lines 1685 to 1698
if (buffer.type == BufferType::Args) {
buffer_binding_map_[key] = 0;
buffer_value_map_[key] = args_buffer_value_;
return args_buffer_value_;
}

if (buffer.type == BufferType::Rets) {
buffer_binding_map_[key] = 1;
buffer_value_map_[key] = ret_buffer_value_;
return ret_buffer_value_;
}

int binding = binding_head_++;
buffer_binding_map_[key] = binding;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand whats happening here. If we return early for "Args" and "Rets", doesn't that mean buffer_argument won't be called for these two buffers? E.g. they would miss the "Block" decoration. This was also the case in #4205, and I've been getting complaints from tint because of that.

Also, wouldn't this cause Args and Rets to be bound to the same binding point as other buffers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe the "Block" decoration should be added to buffer_struct_argument?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing about #4205 and this PR that's a bit problematic: In 4205, compile_args_struct() gets called regardless of whether or not there're any arguments. You fixed that in this PR by returning immediately if num args = 0, but not every offloaded task in the kernel is guaranteed to use those args. So you could end with a offloadedTask/shader that declares the args buffer as an input, but never use it. This might be fine with Vulkan (i'm not sure), but it's causing a lot of trouble for webgpu stuff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call "compile_args" only when we notice a ArgLoadStmt. Same for return values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decoration is added when calling "compile_args_struct" where it calls the “buffer_struct_argument" or "uniform_struct_argument"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it in run rn, afterwards we will have a prepass that scans all the required features / structures, and then we can clean up all the initialization code (instead of the one pass approach we are using rn). If we want to move it to visit(ArgLoadStmt) rn we will need an if statement to avoid generating multiple parts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And an if statement is bad because?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an elegant solution either. Proper fix will happen when we have a prepass featuer scan.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not that ugly either. we know that a visit(ArgLoad) exists iff you need the arg structs. so please move it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think that's the right way. We shouldn't scatter global section initialization and type definitions across all the statements. We will add a prepass soon once the Metal random seed PR is merged.

@AmesingFlank
Copy link
Collaborator

Other than the previously mentioned stuff, LGTM.
Also, thanks for removing aliasing, it helps not only moltenvk but also made life easier for me when dealing with wgsl.

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 8, 2022

/format

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 8, 2022

/format

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 8, 2022

/format

Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

If we are using uniform buffer for args, how do we handle writing to an external array? Or is the external arrays implemented as Ndarray already?

taichi/codegen/spirv/spirv_codegen.cpp Outdated Show resolved Hide resolved
@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 8, 2022

/format

@bobcao3
Copy link
Collaborator Author

bobcao3 commented Feb 8, 2022

Cool!

If we are using uniform buffer for args, how do we handle writing to an external array? Or is the external arrays implemented as Ndarray already?

It's the same as NDarray (truely any_arr)

@bobcao3 bobcao3 requested a review from k-ye February 9, 2022 01:13
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM!

@bobcao3 bobcao3 merged commit 7705f68 into taichi-dev:master Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants