CMake C++ Cuda C Other
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 32 commits ahead, 6 commits behind CIS565-Fall-2016:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

CUDA Path Tracer

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3

  • Trung Le
  • Windows 10 Home, i7-4790 CPU @ 3.60GHz 12GB, GTX 980 Ti (Person desktop)

Stream compaction

In this project, I implemented on top of the provided CUDA started code a fully functional pathtracer with the following features:

  1. Stream compaction on terminated paths during bounces
  2. Diffuse shading
  3. Specular reflection
  4. Specular transmission / Refraction
  5. Mesh support (.obj format)
  6. BVH acceleration data structure with stack-less BVH traversal

Diffuse shading

Diffuse shading is computed by bouncing the next path segment from a diffuse surface with a cosine-weighted random ray direction. The color of the bounced ray is multiplied with the original color using Lambert's law according to the rendering equation.

Alt text

Specular reflection

Similarly, specular reflection bounces the next path segment using glm::reflect with the surface's normal.

Alt text


For refraction, the next path segment's direction is computed using glm::refract with a given index of refraction.

Alt text

From left to right:

  • [IOR - 2.5] Mixed 50% yellow specular reflective and 50% refractive diamond
  • [IOR - 1.5] Refractive water
  • [IOR - 1.62] Refractive crown glass
  • [IOR - 1.77] Refractive saphire
  • [IOR - 2.5] Refractive diamond

You can see the effect of different indices of refraction for each sphere.

Stream compaction

Light path segments that hit the light source or not hitting any objects are terminated by using stream compaction on the list of path segments. In this implementation, I used thrust::partition to sort the path segments that have bottomed to the right of the dev_paths array and those that are still bouncing to the left. At the end, all the pixel colors are computed during final gathering.

Stream compaction definitely increased performance for my scene. For larger resolution, the stream compaction is even more effective since there are a larger percentage of paths we don't have to compute for its bouncing segments.

800x800 scene with 8 bounces

With stream compaction Without stream compaction
11.668 ms/bounce 16.958 ms/bounce

1600x900 scene with 8 bounces

With stream compaction Without stream compaction
15.565 ms/bounce 36.998 ms/bounce

I also did a quick comparison between glm::remove_if and glm::partition, and on average the glm::remove_if takes ~1.206ms/bounce while glm::partion takes ~1.759ms/bounce.

Arbitrary mesh loading and rendering

For meshing, I used tinyobjloader by syoyo to parse .obj format files. Scene::loadSceneFromObj in scene.cpp handles this loading by calling the tinobjloader with triangulation enabled and appends to the scene's geometry and material lists. In essence, each new geometry is of type TRIANGLE with default transformation. If there is no .mtl file attached to the .obj file, a default material is assign to the new geometry.

For ray/triangle intersection, I used the fast, minimum storage ray/triangle intersection by Muller and Trumbore. This method doesn't require computation on the fly or storing of the plane equation which increases memory savings for triangle meshes. See comments of triangleIntersectionTest in intersections.h.

I also tried to use glm::intersectRayTriangle but encountered a CUDA kernel launch error.

Alt text

In this scene, a torus mesh is rendered inside a Cornell box.

Alt text

Source obj file.

Hierarchical spatial datastructures

I implemented bounding volume hierarchy (BVH) to accelerate the ray intersection test but first checking the bounding volume of the scene and its subtree before performing a ray intersection with the primitives. The BVH is constructed on CPU and traversed on GPU during pathtracing. The BVH traversal used a stack-less traversal approach described by Hapala el at. 2011. The BVH spatial data structure can be enabled by setting isBVHEnabled = true, and its visualization by setting isBVHVisualizationEnabled = true.

For this stack-less traversal to work, the BVH needs to be constructed with the following requirements:

  1. Binary BVH tree with exactly two children (also called siblings) nearChild and farChild. All primitives are stored at leaf nodes.
  2. Each node has a pointer to parent.
  3. Each inner node has a unique traversal for a given ray from near child to far child. This order can be different for each ray but has to be the same order for the same ray. The subtree are sorted along the maximum extent axis of its children nodes.

Alt text

  1. Internal nodes only stores a bounding box.

I constructed the BVH tree in 2 steps: built the tree recursively with BVHNode*, then flattened the structure into a BVHNodeDev data array to feed into a kernel launch for parallel processing. This was done to make debugging easier.

With the parent's pointers, now we can iterate through the BVH structure using simple state logic. There are only three traversal states that a node can be entered:

  1. From its parent
  2. From its sibling (from nearChild to farChild)
  3. From its children (out from farChild)

Alt text

At each state, it can be determined where to transition for the next iteration. If it is an interior node, a ray/box intersection is performed to decide whether to continue in this subtree. If it is a leaf node, a ray/primitive intersection is peformed instead and return the point of intersection and its surface's normal.

Torus scene - scene file, obj file - with BVH enabled:

Torus colored Torus BVH
Alt text Alt text

Cornell box scene - scene file - with BVH enabled:

Cornell box colored Cornell box BHV
Alt text Alt text
Alt text

Torus in a cornel box scene - scene file, obj file - with BVH enabled:

Alt text

Even with BVH enabled, the scene however is still taking too long. For the Mario model with 2835 vertices and 2979 normals:

With BVH Without BVH
Average 320.757 ms/bounce 284.9704822 ms/bounce

My implementation for traverseBVH still uses too many branching for each state logic. That means each warp still hasto execute all the branching cases when there is a large divergence when traversing through each BVH node. This in fact isn't quite an efficient implementation for BVH stack-less traversal. There are some common logic that could be shared between each logic state that can greatly reduce the branch divergence. Additionally, my BVHNodeDev currently still stores quite a few extra information that could be packed more efficiently to save memory for each thread.

This feature uses BBox class for axis-aligned bounding volumes, BVHNode class for BVH construction on CPU, and BVHNodeDev class for BVH iterative traversal on GPU using CUDA.

Device spec

General information for CUDA device

  • Device name: GeForce GTX 980 Ti
  • Compute capability: 5.2
  • Compute mode: Default
  • Clock rate: 1076000
  • Integrated: 0
  • Device copy overlap: Enabled
  • Kernel execution timeout: Enabled

Memory information for CUDA device

  • Total global memory: 6442450944
  • Total constant memory: 65536
  • Multiprocessor count: 22
  • Shared memory per multiprocessor: 98304
  • Registers per multiprocessor: 65536
  • Max threads per multiprocessor: 2048
  • Max grid dimensions: [2147483647, 65535, 65535]
  • Max threads per block: 1024
  • Max registers per block: 65536
  • Max thread dimensions: [1024, 1024, 64]
  • Threads per block: 512


Motion blur (incomplete)

For motion blur, I store the previous frames and draw the current frames on top of them. However, this feature isn't fully completed. For it to work properly, I need to clear out the previous frames after a certain amount of time, and also only start saving previous frames when there are camera changes.

Alt text Alt text Alt text


Computer problems

Alt text Alt text Alt text


Thirdparty codes

Obj files


Camera control

I implemented a custom Camera class to better handle the viewer and to compute the viewProj for the ShaderProgram.

  • Left mouse drag to pan left/righ/up/down
  • Right mouse drag to zoom in/out
  • Middle mouse drag to change look at point

Obj file path

Ideally, the file path for .obj should be given through the scene files or as a command argument, but I haven't yet implemented this. Please update the file path in Scene::Scene in scene.cpp to point to desired obj file in /src/scenes/obj/.

Shader file path

I implemented a custom ShaderProgram class to render BVH bounding boxes using OpenGL. Please verify and update the file path to the shader program if neccessary in init in preview.cpp to point to the correct vertex and fragment shaders paths in /src/glsl/.


The following files are added to CMakeList.txt: