University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4
- Xueyin Wan
- Tested on: Windows 10 x64, i7-6700K @ 4.00GHz 16GB, GTX 970 4096MB (Personal Desktop)
- Compiled with Visual Studio 2013 and CUDA 7.5
Use CUDA to implement a simplified rasterized graphics pipeline, similar to the OpenGL pipeline.
My CUDA Rasterizer Features
About Graphics Pipeline
- Vertex shading
- Primitive assembly with support for triangles read from buffers of index and vertex data
- Fragment shading
- A depth buffer for storing and depth testing fragments
- Fragment-to-depth-buffer writing (with atomics for race avoidance)
- Lambert and Blinn-Phong shading
- UV texture mapping with bilinear texture filtering
- UV texture mapping with perspective correct texture coordinates
- Support rasterizing Triangles, Lines, Points
- Super Sampling Anti-Aliasing (SSAA)
- Correct color interpolation between points on a primitive
As we all know, rasterizeration converts vector graphics into dot matrix graphics. It is quite popular and important in real-time rendering area. Modern 3D rendering APIs like OpenGL, DirectX (Microsoft), Vulkan (Khronos, quite new area) are all implemented related to rasterize techniques.
Different from ray tracing technique (as my last project shows), there's no concept of shooting rays during the whole procedure. The whole graphics pipeline is like this: (Sort by stage order)
1. Vertex Assembly
Pull together a vertex from one or more buffers
2. Vertex Shader
Transform incoming vertex position from model to clip coordinates
World Space -> View Space -> Clipping Space -> Normalized Device Coordinates (NDC) Space -> Viewport (Screen/Window) Space
3. Primitive Assembly
A vertex shader processes one vertex. Primitive asstriangleembly groups vertices forming one primitive, e.g., a, line, etc.
Determine what pixels a primitive overlaps
5. Fragment Shader
Shades the fragment by simulating the interaction of light and material. Different rendering scheme could be applied here: Blinn-Phong, Lambert, Non-Photorealistic Rendering (NPR), etc.
6. Per-Fragment Tests
Choose one candidate to fill framebuffer! Common techniques : Depth test, Scissor test, etc.
Combine fragment color with framebuffer color
Write color to framebuffer, a.k.a tell each pixel its color :)
Showcase My Result
###UV texture mapping with bilinear texture filtering We could see the apparant better result of Bilinear Texture Filtering since we take into account more texture infomation.
|Without Bilinear Texture Filtering||With Bilinear Texture Filtering|
###UV texture mapping with perspective correct texture coordinates
With Perspective Correctness, we figure out the texture coordinates in a more sophisticated way. We interpolate texture coordinates, and then do the perspective divide.
|Without Perspective Correctness||With Perspective Correctness|
###Point Representation & Line Representation
|Point Representation||Line Representation|
|Point Representation||Line Representation|
###Super Sampling Anti-Aliasing (SSAA) Divide one pixel into small pixels, then average to get the final color, put it into framebuffer
|No SSAA||SSAA = 2||SSAA = 4|
###Correct color interpolation between points on a primitive
I use barycentric calculation of each triangle's vertex coordinate v, v & v's color to fill each fragment color.
I use each vertex's normal in view(eye) coordinate system to represent its color in order to visualize.
The two below are I accidently use the first triangle's vertex normal and create an interesting result. Also put here for fun.
|First Triangle Normal Interpolation||First Triangle Normal Interpolation|
###Different Stages of rasterizer time arrangement percentage(Compare between Lines, Points and Triangles)
From the picture below, we can see that when we choose triangle as our primitive, the rasterization stage occupies most of time since we do all the fragmentbuffer calculations & filling there, easpecially for the Axis-Aligned Bounding Box(AABB) calculation. In line & point cases, we need to do much less calculations about fragmentbuffer in rasterization stage, render case as a result plays a larger role to fill the framebuffer.
###Compare between Super-Sampling Anti-Aliasing(SSAA) and No SSAA
Based on CG knowledge, we all know that SSAA is a time-consuming stage. Since we need to open a larger fragment buffer, frame buffer and depth buffer, so the calculations maybe mutiple in order to get an average value then send it to PBO. Here the result is based on CesiumMilkTruck.gltf, and my table is based on the time/fps. We could see that as SSAA level increases, we get one frame with more time!
###Compare between Bilinear, Bilinear Texture Filtering, Perspective Correctness Texture Filtering
Bilinear adds the percentage of the render stage during the whole cuda launch since we need to consider more pixels around to improve correctness(super sampling comes into play again)! Perspective Correctness adds the percentage of rasterization, since it contains more interpolation and calculation during the rasterization procedure.
|Bilinear Open||Perspective Open||Nothing Opens|
|Rasterization Stage Percentage of GPU(%)||55.69||59.36||53.98|
|Render Stage Percentage of GPU(%)||2.06||1.5||1.76|
|Bilinear Open||Perspective Open|