This tutorial demonstrates how to use the render passes API to implement simple deferred shading.
Render passes is a feature of the next-generation APIs that allows applications to define rendering commands in a way that better maps to tiled-deferred rendering architectures used by virtually all mobile platforms. Unlike immediate rendering architectures typical for desktop platforms, tiled-deferred renderers split the screen into small tiles (e.g. 64x64 pixels, the actual size depends on multiple factors including render target format, fast memory size, GPU vendor, etc.) and perform rendering operations tile after tile. This allows GPU to keep all data in a fast GPU-local cache, which is both faster and more power-efficient. When GPU is done processing one tile, it flushes all the data to the main memory and moves to the next tile.
Render passes were introduced to give applications explicit control over tile operations. A good metal model of a render pass is a set of operations that the GPU performs in a local tile cache before flushing the data to the main memory and moving to the next tile.
A render pass is defined by the following key components:
-
Render pass attachments, which are the set of texture views used within the render pass. Every attachment defines how its contents should be treated at the beginning of the render pass (load operations) as well as at the end of the render pass (store operation). The attachments can be used as outputs (render target or depth-stencil) in one subpass as well as inputs to other subpasses. A render pass can also perform multisample resolve operations at the end of the subpass.
-
Subpasses. A render pass has one more subpasses. Every subpass defines a subset of render pass attachments that are used as output attachments, input attachments and resolve attachments.
-
Subpass dependencies that define subpass attachment state transitions (e.g. from render target to input attachment).
Diligent Engine enables applications to use and intermix render target API and render passes API.
While the former one is a more implicit way, the latter is a more explicit approach and requires more effort from the application
developers. Most importantly, no state transitions are allowed within the render pass. As a result, an application must not use
RESOURCE_STATE_TRANSITION_MODE_TRANSITION
with any command while a render pass is active.
This tutorial demonstrates a simple deferred shading renderer implemented using render passes API. The render pass consists of two subpasses. The first subpass is a G-buffer pass: it renders the scene and populates two buffers - color and depth. The second pass is a lighting pass. It renders light volumes and applies simple distance-based lighting to the G-buffer. Using the render passes API lets the driver reorder the operations and fuse G-buffer pass and lighting pass into a single tile operation thus avoiding the need to store intermediate G-buffer data to the main memory and reading it back.
To create a render pass we need to prepare an instance of RenderPassDesc
struct.
But first we need to define some auxiliary data.
The first piece of the information we need to define is the render pass attachments. In this tutorial we will be using 4 attachments:
- Color buffer
- Depth Z
- Depth buffer
- Final color buffer
constexpr Uint32 NumAttachments = 4;
RenderPassAttachmentDesc Attachments[NumAttachments];
The first attachment is the color G-Buffer:
Attachments[0].Format = TEX_FORMAT_RGBA8_UNORM;
Attachments[0].InitialState = RESOURCE_STATE_RENDER_TARGET;
Attachments[0].FinalState = RESOURCE_STATE_INPUT_ATTACHMENT;
Attachments[0].LoadOp = ATTACHMENT_LOAD_OP_CLEAR;
Attachments[0].StoreOp = ATTACHMENT_STORE_OP_DISCARD;
Notice that we must specify the initial attachment state that the corresponding texture will be in
before the render pass begins as well as the final state it will be in after the render pass ends.
Also notice that as the load operation, we specify ATTACHMENT_LOAD_OP_CLEAR
. This will tell
the driver that old contents of the texture is not needed and should not be loaded from the main
memory. Also note that as the store operation we use ATTACHMENT_STORE_OP_DISCARD
that instructs
the driver to discard all the data after the end of the render pass thus avoiding the need to
write it back to the main memory.
The second attachment is the normalized device Z coordinate. Note that we can't extract this from the depth buffer (attacment 3), as we can't use it as both depth-stencil and input attachment during the second lighting subpass.
Attachments[1].Format = TEX_FORMAT_R32_FLOAT;
Attachments[1].InitialState = RESOURCE_STATE_RENDER_TARGET;
Attachments[1].FinalState = RESOURCE_STATE_INPUT_ATTACHMENT;
Attachments[1].LoadOp = ATTACHMENT_LOAD_OP_CLEAR;
Attachments[1].StoreOp = ATTACHMENT_STORE_OP_DISCARD;
Note again that we use ATTACHMENT_LOAD_OP_CLEAR
and ATTACHMENT_STORE_OP_DISCARD
as load and store
operations.
The third attachment is the depth buffer:
Attachments[2].Format = DepthBufferFormat;
Attachments[2].InitialState = RESOURCE_STATE_DEPTH_WRITE;
Attachments[2].FinalState = RESOURCE_STATE_DEPTH_WRITE;
Attachments[2].LoadOp = ATTACHMENT_LOAD_OP_CLEAR;
Attachments[2].StoreOp = ATTACHMENT_STORE_OP_DISCARD;
The last attachment is the final buffer where the shaded result will be written to:
Attachments[3].Format = m_pSwapChain->GetDesc().ColorBufferFormat;
Attachments[3].InitialState = RESOURCE_STATE_RENDER_TARGET;
Attachments[3].FinalState = RESOURCE_STATE_RENDER_TARGET;
Attachments[3].LoadOp = ATTACHMENT_LOAD_OP_CLEAR;
Attachments[3].StoreOp = ATTACHMENT_STORE_OP_STORE;
Note that unlike previous attachments, this time we use ATTACHMENT_STORE_OP_STORE
because
we will need to keep the final image to display it on the screen.
As discussed above, the render pass will have two subpasses. The first subpass is the G-buffer pass, the second one is the lighting pass:
constexpr Uint32 NumSubpasses = 2;
SubpassDesc Subpasses[NumSubpasses];
The first subpass uses attachments 0 and 1 as render targets, and attachment 2 as depth-stencil buffer.
AttachmentReference RTAttachmentRefs0[] =
{
{0, RESOURCE_STATE_RENDER_TARGET},
{1, RESOURCE_STATE_RENDER_TARGET}
};
AttachmentReference DepthAttachmentRef0 = {2, RESOURCE_STATE_DEPTH_WRITE};
Subpasses[0].RenderTargetAttachmentCount = _countof(RTAttachmentRefs0);
Subpasses[0].pRenderTargetAttachments = RTAttachmentRefs0;
Subpasses[0].pDepthStencilAttachment = &DepthAttachmentRef0;
The AttachmentReference
struct defines the attachment number
as well as its state during the subpass.
The second subpass uses attachments 0 and 1 as input attachments, attachment 2 as depth-stencil buffer, and attachment 3 as render target:
AttachmentReference RTAttachmentRefs1[] =
{
{3, RESOURCE_STATE_RENDER_TARGET}
};
AttachmentReference DepthAttachmentRef1 = {2, RESOURCE_STATE_DEPTH_WRITE};
AttachmentReference InputAttachmentRefs1[] =
{
{0, RESOURCE_STATE_INPUT_ATTACHMENT},
{1, RESOURCE_STATE_INPUT_ATTACHMENT}
};
Subpasses[1].RenderTargetAttachmentCount = _countof(RTAttachmentRefs1);
Subpasses[1].pRenderTargetAttachments = RTAttachmentRefs1;
Subpasses[1].pDepthStencilAttachment = &DepthAttachmentRef1;
Subpasses[1].InputAttachmentCount = _countof(InputAttachmentRefs1);
Subpasses[1].pInputAttachments = InputAttachmentRefs1;
Each subpass defines the states of all its attachments, and the
attachments are transitioned between the states when going from one subpasspass to the
next. However, besides attachment states, a render pass must also specify
execution dependencies.
In our specific example, attachments 0 and 1 are used as render targets in the
first subpass and as input attachments in the second. So we need to specify
a dependency from ACCESS_FLAG_RENDER_TARGET_WRITE
access type performed by
PIPELINE_STAGE_FLAG_RENDER_TARGET
pipeline stage of subass 0 to
ACCESS_FLAG_SHADER_READ
access type from PIPELINE_STAGE_FLAG_PIXEL_SHADER
pipeline stage of subpass 1.
SubpassDependencyDesc Dependencies[1];
Dependencies[0].SrcSubpass = 0;
Dependencies[0].DstSubpass = 1;
Dependencies[0].SrcStageMask = PIPELINE_STAGE_FLAG_RENDER_TARGET;
Dependencies[0].DstStageMask = PIPELINE_STAGE_FLAG_PIXEL_SHADER;
Dependencies[0].SrcAccessMask = ACCESS_FLAG_RENDER_TARGET_WRITE;
Dependencies[0].DstAccessMask = ACCESS_FLAG_SHADER_READ;
Execution dependencies is a very complicated topic and is beyond the scope of this tutorial.
Finally, when we have all pieces that describe the render pass,
we can populate the RenderPassDesc
structure and create the render
pass object:
RenderPassDesc RPDesc;
RPDesc.Name = "Deferred shading render pass desc";
RPDesc.AttachmentCount = _countof(Attachments);
RPDesc.pAttachments = Attachments;
RPDesc.SubpassCount = _countof(Subpasses);
RPDesc.pSubpasses = Subpasses;
RPDesc.DependencyCount = _countof(Dependencies);
RPDesc.pDependencies = Dependencies;
m_pDevice->CreateRenderPass(RPDesc, &m_pRenderPass);
Creating a pipeline state object that uses explicit render pass is
mostly the same as creating a PSO that uses render targets, with
one difference: the PSO description structure should use the pRenderPass
and SubpassIndex
members:
PSOCreateInfo.GraphicsPipeline.pRenderPass = m_pRenderPass;
PSOCreateInfo.GraphicsPipeline.SubpassIndex = 0;
Note that when pRenderPass
is not null, all render target
formats as well as depth-stencil format must be TEX_FORMAT_UNKNOWN
,
and the number of render targets must be 0.
The only backend that currently natively supports input attachments is Vulkan, and subpass attachments are only supported in GLSL. To define subpass inputs in the shader, use the following syntax:
layout(input_attachment_index = 0, binding = 0) uniform highp subpassInput g_SubpassInputColor;
layout(input_attachment_index = 1, binding = 1) uniform highp subpassInput g_SubpassInputDepthZ;
In the shader, use subpassLoad
function to load the subpass data:
float Depth = subpassLoad(g_SubpassInputDepthZ).r;
vec3 Color = subpassLoad(g_SubpassInputColor).rgb;
Note that subpassLoad
function does not take the position because it is implicitly defined by
the position of the current fragment.
In all other backends input attachments should be defined as regular textures and accessed appropriately:
Texture2D<float4> g_SubpassInputColor;
SamplerState g_SubpassInputColor_sampler;
Texture2D<float4> g_SubpassInputDepthZ;
SamplerState g_SubpassInputDepthZ_sampler;
...
float Depth = g_SubpassInputDepthZ.Load(int3(PSIn.Pos.xy, 0)).r
float3 Color = g_SubpassInputColor.Load(int3(PSIn.Pos.xy, 0)).rgb;
The final part of the render passes API is the framebuffer. The framebuffer encapsulates
the actual textures that will be used as attachments in the render pass. The framebuffer
must use exactly same number of attachments as the render pass, and the the texture view
formats must match exactly the corresponding render pass attachment formats.
To create a framebuffer, prepare FramebufferDesc
structure and call
IRenderDevice::CreateFramebuffer
method:
ITextureView* pAttachments[] =
{
pColorBuffer->GetDefaultView(TEXTURE_VIEW_RENDER_TARGET),
pDepthZBuffer->GetDefaultView(TEXTURE_VIEW_RENDER_TARGET),
pDepthBuffer->GetDefaultView(TEXTURE_VIEW_DEPTH_STENCIL),
pDstRenderTarget
};
FramebufferDesc FBDesc;
FBDesc.Name = "G-buffer framebuffer";
FBDesc.pRenderPass = m_pRenderPass;
FBDesc.AttachmentCount = _countof(pAttachments);
FBDesc.ppAttachments = pAttachments;
RefCntAutoPtr<IFramebuffer> pFramebuffer;
m_pDevice->CreateFramebuffer(FBDesc, &pFramebuffer);
There are three main subpass commands: BeginRenderPass
, NextSubpass
,
and EndRenderPass
.
BeginRenderPass
as the name suggests begins a render pass and starts
the first subpass. To begin a render pass, besides the render pass itself
we also need to specify a framebuffer, as well as clear values for all attachments
that use ATTACHMENT_LOAD_OP_CLEAR
load operation:
BeginRenderPassAttribs RPBeginInfo;
RPBeginInfo.pRenderPass = m_pRenderPass;
RPBeginInfo.pFramebuffer = pFramebuffer;
OptimizedClearValue ClearValues[4];
// Color
ClearValues[0].Color[0] = 0.f;
ClearValues[0].Color[1] = 0.f;
ClearValues[0].Color[2] = 0.f;
ClearValues[0].Color[3] = 0.f;
// Depth Z
ClearValues[1].Color[0] = 1.f;
ClearValues[1].Color[1] = 1.f;
ClearValues[1].Color[2] = 1.f;
ClearValues[1].Color[3] = 1.f;
// Depth buffer
ClearValues[2].DepthStencil.Depth = 1.f;
// Final color buffer
ClearValues[3].Color[0] = 0.0625f;
ClearValues[3].Color[1] = 0.0625f;
ClearValues[3].Color[2] = 0.0625f;
ClearValues[3].Color[3] = 0.f;
RPBeginInfo.pClearValues = ClearValues;
RPBeginInfo.ClearValueCount = _countof(ClearValues);
RPBeginInfo.StateTransitionMode = RESOURCE_STATE_TRANSITION_MODE_TRANSITION;
m_pImmediateContext->BeginRenderPass(RPBeginInfo);
In the first subpass of our render pass, we render the scene. Then
we call NextSubpass
to move to the lighting subpass and draw
the lights. Finally, we call EndRenderPass
to finish the render pass:
m_pImmediateContext->BeginRenderPass(RPBeginInfo);
DrawScene();
m_pImmediateContext->NextSubpass();
ApplyLighting();
m_pImmediateContext->EndRenderPass();
A very important aspect of render passes that needs to be mentioned again is that
state transitions are not allowed between BeginRenderPass
and EndRenderPass
calls.
The tutorial explicitly transitions all resources it uses to correct state during the initialization:
StateTransitionDesc Barriers[] =
{
{m_pShaderConstantsCB, RESOURCE_STATE_UNKNOWN, RESOURCE_STATE_CONSTANT_BUFFER, true},
{m_CubeVertexBuffer, RESOURCE_STATE_UNKNOWN, RESOURCE_STATE_VERTEX_BUFFER, true},
{m_CubeIndexBuffer, RESOURCE_STATE_UNKNOWN, RESOURCE_STATE_INDEX_BUFFER, true},
{m_pLightsBuffer, RESOURCE_STATE_UNKNOWN, RESOURCE_STATE_VERTEX_BUFFER, true},
{m_CubeTextureSRV->GetTexture(), RESOURCE_STATE_UNKNOWN, RESOURCE_STATE_SHADER_RESOURCE, true} //
};
m_pImmediateContext->TransitionResourceStates(_countof(Barriers), Barriers);
and then uses RESOURCE_STATE_TRANSITION_MODE_VERIFY
mode with every call that requires state transition mode.
Diligent Engine's render passes API largely resembles Vulkan, so Vulkan spec will provide the most comprehensive description. ARM software maintains a list of Vulkan best practices for mobile developers that include attachment load/store operations, attachment layouts transitions, and subpasses.