Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential synchronization bug (flashes in Halo 3) in the Nvidia driver on Ampere #2030

Open
6 of 7 tasks
Triang3l opened this issue Jun 6, 2022 · 3 comments
Open
6 of 7 tasks
Assignees

Comments

@Triang3l
Copy link
Member

Triang3l commented Jun 6, 2022

Validation

Describe what's going wrong

This is more of a note to not forget to report this issue to Nvidia rather than a bug on our side likely.

In Halo 3, as of 55a91af (as well as the earliest version available on GitHub Releases), on the Direct3D 12 backend, with the RTV/DSV render target implementation, and resolution scaling disabled, flashes of various colors appear on the screen randomly, both in the main menu and the gameplay.

Screenshot of a pink flash

Sometimes this issue takes the form of large bloom blobs (usually purple, but sometimes pixels of other colors are taken), sometimes there are white rectangles with defined edges on the screen. These seem to leak the contents of some render targets (especially ones related to bloom) from before some draw (the purple color appears in the EDRAM tile padding on the right side during certain bloom passes), which suggests that some synchronization is missing.

This issue is reproducible on Nvidia GPUs with the Ampere architecture (however, it hasn't been tested on Turing and Volta). Specifically, for me, on the GeForce RTX 3080 Ti, in all my tests on Windows 11 build 22000.675, on the following driver versions (with the default 3D settings with no overrides, after a clean installation):

  • 512.95 (May 24, 2022 — Game Ready Driver)
  • 512.96 (May 23, 2022 — Studio Driver)
  • 472.12 (September 20, 2021 — Game Ready Driver)

@ZolaKluke has also confirmed this on the GeForce RTX 3090.

If frames are captured in PIX or RenderDoc, the bug can be seen in the final screenshot, but when the capture is analyzed, it's not visible in the Present input or anywhere before (at least I wasn't able to reproduce it there by switching between commands). Also, it doesn't appear in RenderDoc if replay looping is launched.

PIX warning analysis and GPU-based validation don't report any issues related to this (aside from missing UNORDERED_ACCESS state for the shared memory buffer in draw commands, but this is related to dynamic switching between the SRV + index buffer and the UAV if memexport is used in the draws, because of which both SRV and UAV are always bound for the shared memory buffer, not something related to render target resolves — however, it's still something to investigate, maybe certain optimizations in the driver rely on actual binding information, we should try binding null descriptors instead of the unused ones). I've checked the resource state transitions happening near resolves and texture loads, they all seem to be correct.

What I've also tried on the code side is inserting a UAV barrier before every transition from the UAV state, as well as not merging multiple barriers into one ResourceBarrier command, however, none of that has helped.

If we don't find the reason of the issue on our side, we need to create some functionality for creating quick reproduction methods for driver developers — most likely frame trace stream replaying with output to a window, as well as starting/stopping tracing — and send a frame trace and the replay application, as well as the source code commit hash and the building instructions, to Nvidia.

Describe what should happen

Screen-space effects should be rendered correctly and in a way that's stable between frames, just like on other hardware this has been tested on (Nvidia GeForce GTX 1070 — Pascal architecture, Intel UHD Graphics 630, AMD Radeon RX Vega 7).

If applicable, provide a callstack here, especially for crashes

No response

If applicable, upload a logfile and link it here

No response

@Triang3l
Copy link
Member Author

Triang3l commented Jun 7, 2022

Also happens with slightly differently-looking flashes with the rasterizer-ordered view render backend implementation, though less frequently, probably because draws take a longer time.

@Triang3l
Copy link
Member Author

Separating the shared memory bindings into an SRV-only table and a UAV-only table did not help unfortunately (even though that has eliminated the remaining correctness issues reported by PIX), looks like purely a driver bug.

@Triang3l
Copy link
Member Author

Triang3l commented May 3, 2023

Not experiencing this on 531.18 on my local build (with some memexport changes, but they're probably not related).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant