Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of conflicts with reused uniform buffers in 0.22 #1557

Closed
faulesocke opened this issue Apr 15, 2021 · 36 comments · May be fixed by #1562
Closed

Lots of conflicts with reused uniform buffers in 0.22 #1557

faulesocke opened this issue Apr 15, 2021 · 36 comments · May be fixed by #1562

Comments

@faulesocke
Copy link

With 0.22 I get a lot of validation errors when recording draw commands. The offending code is always similar to this:

for mesh in &obj.meshes {
    let set = Arc::new(
        PersistentDescriptorSet::start(layout.clone())
            .add_buffer(light.uniform_buffer())
            .unwrap()
            .add_buffer(model_ub.clone())
            .unwrap()
            .build()
            .unwrap(),
    );

    builder
        .draw_indexed(
            shadow_pipeline.clone(),
            &dynamic_state,
            mesh.vb.clone(),
            mesh.ib.clone(),
            set,
            (),
            vec![],
        )
        .unwrap();
}

i.e. some loop that records draw commands and re-uses the same uniform buffer(s) over and over again. The error is always of the form

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: SyncCommandBufferBuilderError(Conflict { command1_name: "vkCmdBindDescriptorSets", command1_param: "Buffer bound to descriptor 0 of set 0", command1_offset: 4, command2_name: "vkCmdBindDescriptorSets", command2_param: "Buffer bound to descriptor 0 of set 0", command2_offset: 8 })', src/main.rs:533:42

The error is returned by the draw_indexed call.
This code worked in 0.21 but stopped working when I upgraded to 0.22. I tried upgrading to 0.22 because I had similar issues (also conflicts but with images instead of buffers and only in a new piece of code that isn't involved here) and thought the overhauled image code in 0.22 could solve my problems. But now I have buffer problems :D

Is there already a fix for this? I'm also fine with patching up vulkano myself and just remove some validation code for now. Would be nice if someone can give me a hint where this could come from.

You can probably reproduce this issue when porting the examples to 0.22.

@Eliah-Lakhin
Copy link
Contributor

@faulesocke Can you show shader's layout, please?

@faulesocke
Copy link
Author

Sure:

#version 450 core

#include <light.inc.glsl>

layout(binding = 0) uniform Light {
    DirectionalLight light;
};

layout(binding = 1) uniform Model {
    mat4 model;
    mat4 normal;
} model;


layout(location = 0) in vec3 position;
layout(location = 1) in vec3 normal;
layout(location = 2) in vec2 texcoord;
layout(location = 3) in vec3 tangent;
layout(location = 4) in vec3 bitangent;

@Rua
Copy link
Contributor

Rua commented Apr 16, 2021

Could you run your code through cargo expand and see what kind of code is being generated for this shader? In particular, whether the descriptor is being marked as readonly.

@faulesocke
Copy link
Author

That must be the problem! None of the uniform buffers are marked readonly.

#[allow(unsafe_code)]
    unsafe impl PipelineLayoutDesc for MainLayout {
        fn num_sets(&self) -> usize {
            1usize
        }
        fn num_bindings_in_set(&self, set: usize) -> Option<usize> {
            match set {
                0usize => Some(2usize),
                _ => None,
            }
        }
        fn descriptor(&self, set: usize, binding: usize) -> Option<DescriptorDesc> {
            match (set, binding) {
                (0usize, 0usize) => Some(DescriptorDesc {
                    ty: DescriptorDescTy::Buffer(DescriptorBufferDesc {
                        dynamic: None,
                        storage: false,
                    }),
                    array_count: 1u32,
                    stages: self.0.clone(),
                    readonly: false,
                }),
                (0usize, 1usize) => Some(DescriptorDesc {
                    ty: DescriptorDescTy::Buffer(DescriptorBufferDesc {
                        dynamic: None,
                        storage: false,
                    }),
                    array_count: 1u32,
                    stages: self.0.clone(),
                    readonly: false,
                }),
                _ => None,
            }
        }

@Rua
Copy link
Contributor

Rua commented Apr 16, 2021

If you can give me the full invocation of shader! and any extra files needed, I can look into this.

@faulesocke
Copy link
Author

Here you are:

mod shadow_vs {
    vulkano_shaders::shader! {
        ty: "vertex",
        path: "shaders/dir_shadow.vert.glsl",
        include: ["shaders/include/"],
    }
}

The dir_shadow.vert.glsl:

#version 450 core

#include <light.inc.glsl>

layout(binding = 0) uniform Light {
    DirectionalLight light;
};

layout(binding = 1) uniform Model {
    mat4 model;
    mat4 normal;
} model;


layout(location = 0) in vec3 position;
layout(location = 1) in vec3 normal;
layout(location = 2) in vec2 texcoord;
layout(location = 3) in vec3 tangent;
layout(location = 4) in vec3 bitangent;


void main() {
    gl_Position = light.viewproj * model.model * vec4(position, 1.0);
}

and light.inc.glsl:

struct DirectionalLight {
    mat4 view;
    mat4 proj;
    mat4 viewproj;
    vec3 direction;
    vec3 color;
};

I guess you can minimalize/remove the vertex input in order to reproduce the issue. Thank you very much for your effort!

@Rua
Copy link
Contributor

Rua commented Apr 16, 2021

On my system with the latest Vulkano master, these descriptors are marked as readonly. So that should be all you need to do.

@faulesocke
Copy link
Author

Migrating to current master fixed this problem, thank you. However, I still have an unresolved conflict with image bindings. I will investigate this and report back here when I know more.

@Rua
Copy link
Contributor

Rua commented Apr 16, 2021

If they are descriptors with the image2D or similar specifier in GLSL, then they are treated as storage images on the Vulkan side, and potentially writable. Is that the case with your code? If so, adding readonly in the GLSL declaration should solve it.

@faulesocke
Copy link
Author

@Rua no they are just sampler2D descriptors.

I tracked my error down to the following situation: When I try to bind the same image to two different descriptor sets (with different layouts) and issue two draw commands, one for each descriptor set then SyncCommandBufferBuilder::append_command returns an Err, because:

  1. the resource was already used in a previous command (vkCmdBindDescriptorSets)
  2. the resources memory.exclusive is set. In this case the comments already state that a collision was detected. I believe this is not correct.
  3. I'm not sure what the logic is trying to check afterwards

Should I open a new issue to discuss this problem, since it's separate from the others?

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

If exclusive is being set, then that's probably because the descriptors aren't readonly. Maybe this is the same problem, just one that isn't fixed yet? You use cargo expand again and see if that's the case.

@faulesocke
Copy link
Author

Well when using sampler2D it should always be readonly right?

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

It certainly should, but who knows if Vulkano currently agrees with us! :D

@faulesocke
Copy link
Author

faulesocke commented Apr 20, 2021

I've checked cargo expand again and this time the descriptors are readonly:

                fn descriptor(&self, set: usize, binding: usize) -> Option<DescriptorDesc> {
                    match (set, binding) {
                        (0usize, 2usize) => Some(DescriptorDesc {
                            ty: DescriptorDescTy::CombinedImageSampler(DescriptorImageDesc {
                                sampled: true,
                                dimensions: DescriptorImageDescDimensions::TwoDimensional,
                                format: None,
                                multisampled: false,
                                array_layers: DescriptorImageDescArray::Arrayed {
                                    max_layers: None,
                                },
                            }),
                            array_count: 1u32,
                            stages: self.0.clone(),
                            readonly: true,
                        }),
                        (0usize, 1usize) => Some(DescriptorDesc {
                            ty: DescriptorDescTy::CombinedImageSampler(DescriptorImageDesc {
                                sampled: true,
                                dimensions: DescriptorImageDescDimensions::TwoDimensional,
                                format: None,
                                multisampled: false,
                                array_layers: DescriptorImageDescArray::Arrayed {
                                    max_layers: None,
                                },
                            }),
                            array_count: 1u32,
                            stages: self.0.clone(),
                            readonly: true,
                        }),
// ...

Oh, and I lied, I was actually using sampler2DArray. Maybe that is where my problems come from?

Edit: Here is the GLSL for the above descriptors:

layout(binding = 0) uniform sampler2DArray gbuffer_diffuse;
layout(binding = 1) uniform sampler2DArray gbuffer_normals;

@faulesocke
Copy link
Author

Where in the code is this exclusive flag set?

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

It's set in https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/commands.rs#L2745-L2816. As you can see it's literally just the inverse of the readonly flag. There are other places in that file where it's also set.

@faulesocke
Copy link
Author

Okay, in https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/base.rs#L563 the append_command method checks for potential conflicts. I've added dbg! statements and here is what I get:

[.../vulkano/src/command_buffer/synced/base.rs:569] memory.exclusive = false
[.../vulkano/src/command_buffer/synced/base.rs:570] entry.get().memory.exclusive = true
[.../vulkano/src/command_buffer/synced/base.rs:571] entry.get().current_layout = ShaderReadOnlyOptimal
[.../vulkano/src/command_buffer/synced/base.rs:571] start_layout = ShaderReadOnlyOptimal

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

Ok, that means that this descriptor is correct, but it's colliding with a previous access somewhere that requires exclusive access. In your first post, you gave this message:

Conflict {
    command1_name: "vkCmdBindDescriptorSets",
    command1_param: "Buffer bound to descriptor 0 of set 0",
    command1_offset: 4,
    command2_name: "vkCmdBindDescriptorSets",
    command2_param: "Buffer bound to descriptor 0 of set 0",
    command2_offset: 8,
}

command1 here is that previous command, which appears to be another descriptor set. The offsets indicate which command in the sequence is the culprit, but they are expressed in terms of raw Vulkan commands, not necessarily in calls on AutoCommandBufferBuilder. What puzzles me here is why SyncCommandBufferBuilder doesn't just insert a pipeline barrier and call it a day. Is there perhaps a render pass begin/end in between the two?

@faulesocke
Copy link
Author

The command buffer contains only two draw commands, but of course this results in more vkCmd* calls. The two draw commands use different pipelines with different descriptor layouts. Both draw commands are in the same renderpass.

Perhaps it's time to mention, that I'm using a slightly modified version of the AttachmentImage that allows me to have multiple array layers in my draw targets. These AttachmentImages get bound to the descriptors which cause the problems.

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

What do the descriptors for the first command look like? And are these perhaps being marked as not readonly by shader!?

@faulesocke
Copy link
Author

The first command has only 2 descriptors, I've already checked them, they are correctly marked as readonly:

fn descriptor(&self, set: usize, binding: usize) -> Option<DescriptorDesc> {
    match (set, binding) {
        (0usize, 0usize) => Some(DescriptorDesc {
            ty: DescriptorDescTy::CombinedImageSampler(DescriptorImageDesc {
                sampled: true,
                dimensions: DescriptorImageDescDimensions::TwoDimensional,
                format: None,
                multisampled: false,
                array_layers: DescriptorImageDescArray::Arrayed {
                    max_layers: None,
                },
            }),
            array_count: 1u32,
            stages: self.0.clone(),
            readonly: true,
        }),
        (0usize, 1usize) => Some(DescriptorDesc {
            ty: DescriptorDescTy::CombinedImageSampler(DescriptorImageDesc {
                sampled: true,
                dimensions: DescriptorImageDescDimensions::TwoDimensional,
                format: None,
                multisampled: false,
                array_layers: DescriptorImageDescArray::NonArrayed,
            }),
            array_count: 1u32,
            stages: self.0.clone(),
            readonly: true,
        }),
        _ => None,
    }
}

The glsl source for these is:

layout(binding = 0) uniform sampler2DArray gbuffer_diffuse;
layout(binding = 1) uniform sampler2D ssao_buffer;

The second command has 6 descriptors in total and I only gave the first two in my post 30 minutes before (for readability). However, as you may guess, the same image (and the same sampler object) is bound to the descriptor binding 0 in both draw commands.

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

The only thing left that I see, that could be causing this, is the code for handling image layout transitions. If the image is initialised, but not in the correct layout for a command, then a transition is inserted before the command, and this counts as exclusive access: https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/base.rs#L736

@faulesocke
Copy link
Author

This has to be the case here! I just inserted some more dbg! statements to see what the conflicting resource entry looks like. See yourself:

[.../vulkano/vulkano/src/command_buffer/synced/base.rs:556] entry.get() = ResourceState {
    memory: PipelineMemoryAccess {
        stages: PipelineStages {
            top_of_pipe: false,
            draw_indirect: false,
            vertex_input: false,
            vertex_shader: false,
            tessellation_control_shader: false,
            tessellation_evaluation_shader: false,
            geometry_shader: false,
            fragment_shader: true,
            early_fragment_tests: false,
            late_fragment_tests: false,
            color_attachment_output: false,
            compute_shader: false,
            transfer: false,
            bottom_of_pipe: false,
            host: false,
            all_graphics: false,
            all_commands: false,
        },
        access: AccessFlagBits {
            indirect_command_read: false,
            index_read: false,
            vertex_attribute_read: false,
            uniform_read: false,
            input_attachment_read: false,
            shader_read: true,
            shader_write: false,
            color_attachment_read: false,
            color_attachment_write: false,
            depth_stencil_attachment_read: false,
            depth_stencil_attachment_write: false,
            transfer_read: false,
            transfer_write: false,
            host_read: false,
            host_write: false,
            memory_read: false,
            memory_write: false,
        },
        exclusive: true,
    },
    exclusive_any: true,
    initial_layout: ColorAttachmentOptimal,
    current_layout: ShaderReadOnlyOptimal,
}

@faulesocke
Copy link
Author

But why is this transition happening? ColorAttachmentOptimal should be the correct layout, no need to change it to ShaderReadOnlyOptimal, right?

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

The code at https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/commands.rs#L2777-L2796 is what chooses the layout for an image, and it does so based on the descriptor type. Since you're using a combined image sampler descriptor, it chooses the layout that the image returns for combined image samplers. The image has control over this though, so if you made your own custom image, then you should look at how you implemented descriptor_layouts on the ImageAccess trait.

@faulesocke
Copy link
Author

I just copied the implementation from vulkanos AttachmentImage and indeed it returns ShaderReadOnlyOptimal for all descriptor types. I've read up in the vulkan docs, and COLOR_ATTACHMENT_OPTIMAL is for attaching to framebuffers. So ShaderReadOnlyOptimal should be correct.

So the error must be somewhere else.

@faulesocke
Copy link
Author

I've also changed my code to the following: Now I issue two draw commands, but both with the same pipeline (of course same descriptor set layout) but different descriptor sets (I create a new one for each draw call). I keep getting the same error.

This is the only place in my code, I believe, where I use an AttachmentImage in more than one draw call in the same command buffer. I will try changing the code to see whether the new descriptor set is really the cause of my problems and then report back.

@faulesocke
Copy link
Author

Wow, this really fixed it. So the problem really is drawing with the same image in two different descriptor sets.

@faulesocke
Copy link
Author

Could this have something to do with the new ImageView? Previously (when it did not work) I re-created the ImageView for every draw call. Now I (obviously) only create it once.

@Rua
Copy link
Contributor

Rua commented Apr 20, 2021

ImageView doesn't really have anything to do with layouts or accesses, that's all done at the ImageAccess level.

However, what I do see is that ImageAccess has other methods related to layouts. In particular, for AttachmentImage, the initial_layout_requirement and final_layout_requirement functions return the value of self.attachment_layout. This value is set in the constructor here: https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/image/attachment.rs#L414-L418. What this means is that at the start of a command buffer, the layout is always expected to be ColorAttachmentOptimal, and the command buffer is also required to put it back in that layout at the end.

What I see happening here is that SyncCommandBufferBuilder assumes that the image starts off in ColorAttachmentOptimal when it's used in that command buffer for the first time, but the descriptor set wants it in ShaderReadOnlyOptimal. So it adds a pipeline barrier for a layout transition before the first draw call. This counts as a write to the image. In principle, when two accesses conflict, a barrier should be automatically inserted, which "protects" the accesses on both sides of the barrier from each other. So it's somewhat puzzling that it doesn't insert a barrier before the second draw call, but sees it as an unresolvable conflict instead.

@faulesocke
Copy link
Author

faulesocke commented Apr 21, 2021

I managed to reproduce the issue with vulkano master and a somewhat minimal example. To view it, please checkout my fork at https://github.com/faulesocke/vulkano/tree/bug then switch to the bug branch, go to the examples directory and run cargo run --bin bug.

Now I am absolutely certain that this has nothing to do with my custom version of the AttachmentImage.

@Rua
Copy link
Contributor

Rua commented Apr 21, 2021

Thank you, that will help a lot. I'll have a look soon.

@faulesocke
Copy link
Author

No hurry. I will investigate it further as well. Already learned a lot about the vulkano internals.

@faulesocke
Copy link
Author

Okay, it looks like the logic is somewhat fundamentally broken here. First, let me explain why it works when I only have one descriptor set: In this case no second vkCmdBindDescriptorSets is generated so the offending code isn't even triggered. That's okay, since when no new descriptor set is bound, no additional barriers are required.

Now why it doesn't work with two different sets:
In this case the first vkCmdBIndDescriptorSets generates a resource entry for the AttachmentImage that is bound to the descriptor. However, that resource entry gets modified in https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/base.rs#L736 where the exclusive flag is set to true. This is because:

  1. the initial layout requirement (which is ShaderReadOnlyOptimal) differs from the actual layout the resource is in (ColorAttachmentOptimal).
  2. The layout is already initialized.

In this case the exclusive flag gets set in the PipelineMemoryAccess object which is inserted a few lines later. Vulkano also correctly adds a image memory barrier to transition the image to the desired layout.

Now when the second call to AutoCommandBufferBuilder::draw comes, a new vkCmdBindDescriptorSets is generated with the same resource. In https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/base.rs#L553 the code finds that the resource is already added, which might cause a potential conflict. Next it is check whether the current or the already existing resource have the exclusive flag set. I believe this flag indicates when a write operation occours. Since this is the case for the existing resource, the code assumes that now this is a definitive conflict that must be resolved.

However this is actually not true. The image is already in the desired layout because of the previously inserted barrier. No further barriers are required.

The code then tries to insert further barriers to resolve the conflict but before this makes one important check in https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/synced/base.rs#L595: When the command, which caused the exclusive resource usage comes after the latest enter renderpass command it assumes that the conflict can't be resolved since it thinks the image is written in the same pass as it is read. Then the error get's returned.

In my opinion, this should be fixed by not setting the exclusive flag in the first place, i.e. not counting layout transitions as write operations. Does anyone know, why this design decision was made?

@Rua
Copy link
Contributor

Rua commented Apr 21, 2021

Nice investigation work! Unfortunately, I don't know why it was made this way. Most of Vulkano was written by Tomaka, who is unfortunately no longer an active contributor. I've done some work on SyncCommandBufferBuilder myself in order to handle the execute_commands command, and it's definitely not easy to understand both what it's doing and why it's doing it.

The subtleties of Vulkan synchronisation are also notoriously difficult. You mention that the layout-transitioning barrier already acts as a write + barrier combination, so that anything that comes after will always see the image in the right layout. If that's the case, to what degree is a layout transition considered exclusive? On one side, it has a barrier built in, so I assume it doesn't conflict with operations on either side of the barrier. On the other side though, it does write the image, so if the image is accessed concurrently on another queue then that is definitely a conflict. Barriers only protect on a single queue.

There's a lot that I need to figure out before I fully understand how this is supposed to work. But your research helps a lot. And if you manage to make a PR that would be even better!

@Rua
Copy link
Contributor

Rua commented May 14, 2022

Is this still a problem or can it be closed?

@Rua Rua closed this as completed Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants