Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StdDescriptorPool rewamp #1943

Merged
merged 4 commits into from Aug 7, 2022
Merged

StdDescriptorPool rewamp #1943

merged 4 commits into from Aug 7, 2022

Conversation

marc0246
Copy link
Contributor

@marc0246 marc0246 commented Aug 5, 2022

Changelog:

- Changes to `StdDescriptorPool`:
  - Now implemented lock-less, using thread-local storage.
  - **Breaking** removed `Device::standard_descriptor_pool` in favor of `Device::with_standard_descriptor_pool`.
- **Breaking** `DescriptorPool::allocate` now takes `&Arc<DescriptorSetLayout>` instead of `&DescriptorSetLayout`.
- **Breaking** `SingleLayoutDescSetPool::new` now returns `Result`.
- Added `SingleLayoutVariableDescSetPool`.

Seeing as StdDescriptorPool was originally using 3 layers of Mutex, and the allocation algorithm wasn't very optimized, I felt the need to address these and some other issues.

  • Addressing the locks was the easy part, they got the same treatment as in Make StandardCommandPool lockless #1939.
  • DescriptorPool being implemented on Arc<StdDescriptorPool> now made little sense, since the StdDescriptorPools are all thread local and DescriptorPool takes &mut self. The trait is now implemented on StdDescriptorPool directly, and Device::standard_descriptor_pool has been removed, since it handed out Arc clones previously, which would now be redundant overhead. Instead, Device::with_standard_descriptor_pool can be used, which still gives the user full access to the StdDescriptorPool without the need for these Arc clones. I will be working on consistency with StandardCommandPool in an upcoming PR.
  • Regarding the allocation strategy, I was pretty sure the amazing work that was done on SingleLayoutDescSetPool would perform the best, but I tried some other ideas only to conclude that is the case.
    • To allocate one set, SingleLayoutDescSetPool took 58ns on my machine. That is in contrast to the current strategy employed in StdDescriptorPool which takes 3800ns with no contention. There's just some small details I thought could maximize the performance of SingleLayoutDescSetPool, which reduced the time to 38ns. Not a big difference, but that's what happens when there's not much to improve in the first place!
    • The next question to ask was what to do about variable descriptor counts. Apart from pre-allocating a fixed number of descriptor sets like SingleLayoutDescSetPool does and reusing those, I believe the next-best performing strategy is to reuse the Vulkan pools. Allocate from a pool until its full, then yeet it and grab a new one, then once the pool is no longer needed reset it as a whole and put it back in the queue. This of course has the advantage that the Vulkan implementation can use a simple bump allocator, and no pool fragmentation can occur. Also, this saves some cycles since pool creation is expensive. So this is the approach I took for variable descriptor counts, but I think we could absolutely have one pool per count per layout for the performance. If someone wants to do that I'm all for it. This approach clocks in at 104ns for me.
    • Updating StdDescriptorPool was now as easy as using the now added SingleLayoutVariableDescSetPool together with SingleLayoutDescSetPool to support all of the DescriptorPool API. The result is that allocating a fixed count descriptor set takes 104ns, and variable count takes 171ns for me. Meaning that the new StdDescriptorPool adds about 66ns of overhead on top. I don't like this one bit and I have a good idea where this overhead is coming from, so expect future PRs.

Self::new_with_pool(layout, 0, &mut pool, descriptor_writes)
layout
.device()
.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice here and a few other places, you clone the device Arc. But since with_standard_descriptor_pool takes a reference, the clone isn't needed?

Copy link
Contributor Author

@marc0246 marc0246 Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is, because the closure takes ownership of layout, and without cloning the device the layout would still be borrowed. I thought about this and it is possible to fix it by changing the argument to &Arc<DescriptorSetLayout>, the only reason I did not do this is because I didn't want to break the API. But indeed I would like this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding move to the closure work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sadly.

@Rua
Copy link
Contributor

Rua commented Aug 7, 2022

The runtime_array example is no longer working, and I get a validation error when it runs:

VUID-VkWriteDescriptorSet-dstArrayElement-00321(ERROR / SPEC): msgNum: -1862672269 - Validation Error: [ VUID-VkWriteDescriptorSet-dstArrayElement-00321 ] Object 0: handle = 0x5641be4dee08, type = VK_OBJECT_TYPE_DESCRIPTOR_SET; | MessageID = 0x90f9e073 | vkUpdateDescriptorSets() pDescriptorWrites[0] failed write update validation for VkDescriptorSet 0x5641be4dee08[] with error: Attempting write update to VkDescriptorSet 0x5641be4dee08[] allocated with VkDescriptorSetLayout 0x5641bdf75eb0[] binding index #0 array element 0 with 2 writes but variable descriptor size is 0. The Vulkan spec states: The sum of dstArrayElement and descriptorCount must be less than or equal to the number of array elements in the descriptor set binding specified by dstBinding, and all applicable consecutive bindings, as described by consecutive binding updates (https://vulkan.lunarg.com/doc/view/1.3.216.0/linux/1.3-extensions/vkspec.html#VUID-VkWriteDescriptorSet-dstArrayElement-00321)
Objects: 1
[0] 0x5641be4dee08, type: 23, name: NULL

@marc0246
Copy link
Contributor Author

marc0246 commented Aug 7, 2022

Oh wow, thank you for catching that! What's really weird is that this worked fine on my system even with this huge issue.

@marc0246
Copy link
Contributor Author

marc0246 commented Aug 7, 2022

Forgot to mention that the issue has been fixed.

@Rua
Copy link
Contributor

Rua commented Aug 7, 2022

What's really weird is that this worked fine on my system even with this huge issue.

That's the nasty thing about undefined behaviour. Sometimes it just works, and you never notice!

@Rua Rua merged commit 8f69034 into vulkano-rs:master Aug 7, 2022
Rua added a commit that referenced this pull request Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants