CUDA Path Tracer
University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
- Trung Le
- Windows 10 Home, i7-4790 CPU @ 3.60GHz 12GB, GTX 980 Ti (Person desktop)
In this project, I implemented on top of the provided CUDA started code a fully functional pathtracer with the following features:
- Stream compaction on terminated paths during bounces
- Diffuse shading
- Specular reflection
- Specular transmission / Refraction
- Mesh support (.obj format)
- BVH acceleration data structure with stack-less BVH traversal
Diffuse shading is computed by bouncing the next path segment from a diffuse surface with a cosine-weighted random ray direction. The color of the bounced ray is multiplied with the original color using Lambert's law according to the rendering equation.
Similarly, specular reflection bounces the next path segment using
glm::reflect with the surface's normal.
For refraction, the next path segment's direction is computed using
glm::refract with a given index of refraction.
From left to right:
- [IOR - 2.5] Mixed 50% yellow specular reflective and 50% refractive diamond
- [IOR - 1.5] Refractive water
- [IOR - 1.62] Refractive crown glass
- [IOR - 1.77] Refractive saphire
- [IOR - 2.5] Refractive diamond
You can see the effect of different indices of refraction for each sphere.
Light path segments that hit the light source or not hitting any objects are terminated by using stream compaction on the list of path segments. In this implementation, I used
thrust::partition to sort the path segments that have bottomed to the right of the
dev_paths array and those that are still bouncing to the left. At the end, all the pixel colors are computed during final gathering.
Stream compaction definitely increased performance for my scene. For larger resolution, the stream compaction is even more effective since there are a larger percentage of paths we don't have to compute for its bouncing segments.
800x800 scene with 8 bounces
|With stream compaction||Without stream compaction|
|11.668 ms/bounce||16.958 ms/bounce|
1600x900 scene with 8 bounces
|With stream compaction||Without stream compaction|
|15.565 ms/bounce||36.998 ms/bounce|
I also did a quick comparison between
glm::partition, and on average the
glm::remove_if takes ~1.206ms/bounce while
glm::partion takes ~1.759ms/bounce.
Arbitrary mesh loading and rendering
For meshing, I used tinyobjloader by syoyo to parse .obj format files.
scene.cpp handles this loading by calling the tinobjloader with triangulation enabled and appends to the scene's geometry and material lists. In essence, each new geometry is of type
TRIANGLE with default transformation. If there is no .mtl file attached to the .obj file, a default material is assign to the new geometry.
For ray/triangle intersection, I used the fast, minimum storage ray/triangle intersection by Muller and Trumbore. This method doesn't require computation on the fly or storing of the plane equation which increases memory savings for triangle meshes. See comments of
I also tried to use
glm::intersectRayTriangle but encountered a CUDA kernel launch error.
In this scene, a torus mesh is rendered inside a Cornell box.
Hierarchical spatial datastructures
I implemented bounding volume hierarchy (BVH) to accelerate the ray intersection test but first checking the bounding volume of the scene and its subtree before performing a ray intersection with the primitives. The BVH is constructed on CPU and traversed on GPU during pathtracing. The BVH traversal used a stack-less traversal approach described by Hapala el at. 2011. The BVH spatial data structure can be enabled by setting
isBVHEnabled = true, and its visualization by setting
isBVHVisualizationEnabled = true.
For this stack-less traversal to work, the BVH needs to be constructed with the following requirements:
- Binary BVH tree with exactly two children (also called siblings)
farChild. All primitives are stored at leaf nodes.
- Each node has a pointer to parent.
- Each inner node has a unique traversal for a given ray from near child to far child. This order can be different for each ray but has to be the same order for the same ray. The subtree are sorted along the maximum extent axis of its children nodes.
- Internal nodes only stores a bounding box.
I constructed the BVH tree in 2 steps: built the tree recursively with
BVHNode*, then flattened the structure into a
BVHNodeDev data array to feed into a kernel launch for parallel processing. This was done to make debugging easier.
With the parent's pointers, now we can iterate through the BVH structure using simple state logic. There are only three traversal states that a node can be entered:
- From its parent
- From its sibling (from nearChild to farChild)
- From its children (out from farChild)
At each state, it can be determined where to transition for the next iteration. If it is an interior node, a ray/box intersection is performed to decide whether to continue in this subtree. If it is a leaf node, a ray/primitive intersection is peformed instead and return the point of intersection and its surface's normal.
|Torus colored||Torus BVH|
Cornell box scene - scene file - with BVH enabled:
|Cornell box colored||Cornell box BHV|
Even with BVH enabled, the scene however is still taking too long. For the Mario model with 2835 vertices and 2979 normals:
|With BVH||Without BVH|
|Average||320.757 ms/bounce||284.9704822 ms/bounce|
My implementation for
traverseBVH still uses too many branching for each state logic. That means each warp still hasto execute all the branching cases when there is a large divergence when traversing through each BVH node. This in fact isn't quite an efficient implementation for BVH stack-less traversal. There are some common logic that could be shared between each logic state that can greatly reduce the branch divergence. Additionally, my
BVHNodeDev currently still stores quite a few extra information that could be packed more efficiently to save memory for each thread.
This feature uses
BBox class for axis-aligned bounding volumes,
BVHNode class for BVH construction on CPU, and
BVHNodeDev class for BVH iterative traversal on GPU using CUDA.
General information for CUDA device
- Device name: GeForce GTX 980 Ti
- Compute capability: 5.2
- Compute mode: Default
- Clock rate: 1076000
- Integrated: 0
- Device copy overlap: Enabled
- Kernel execution timeout: Enabled
Memory information for CUDA device
- Total global memory: 6442450944
- Total constant memory: 65536
- Multiprocessor count: 22
- Shared memory per multiprocessor: 98304
- Registers per multiprocessor: 65536
- Max threads per multiprocessor: 2048
- Max grid dimensions: [2147483647, 65535, 65535]
- Max threads per block: 1024
- Max registers per block: 65536
- Max thread dimensions: [1024, 1024, 64]
- Threads per block: 512
Motion blur (incomplete)
For motion blur, I store the previous frames and draw the current frames on top of them. However, this feature isn't fully completed. For it to work properly, I need to clear out the previous frames after a certain amount of time, and also only start saving previous frames when there are camera changes.
- wahoo.obj and dodecahedron.obj are test scene files taken from CIS460, Fall 2016.
- The rest of the .obj files and .mtl files are from tinyobjloader.
I implemented a custom
Camera class to better handle the viewer and to compute the
viewProj for the
- Left mouse drag to pan left/righ/up/down
- Right mouse drag to zoom in/out
- Middle mouse drag to change look at point
Obj file path
Ideally, the file path for .obj should be given through the scene files or as a command argument, but I haven't yet implemented this. Please update the file path in
scene.cpp to point to desired obj file in /src/scenes/obj/.
Shader file path
I implemented a custom
ShaderProgram class to render BVH bounding boxes using OpenGL. Please verify and update the file path to the shader program if neccessary in
preview.cpp to point to the correct vertex and fragment shaders paths in /src/glsl/.
The following files are added to
"bbox.h" "bbox.cpp" "bvh.h" "bvh.cpp" "shaderProgram.h" "shaderProgram.cpp" "camera.h" "camera.cpp"