Final Report
Abstract
In this project, a proof-of-concept program was created to determine if it is possible to perform ray tracing in a shader program running on the GPU. Real-time performance was achieved for smaller scenes at the cost of visual quality.
Technical approach
Techniques used
I first started off my final project by combining Project 4 (Cloth Simulation) and Project 3 (Ray Tracing). My reasoning for why combining the projects together would be a good idea was that I wanted to target OpenGL Version 3.3 Core so that even older computers would be able to run the GPU ray tracer. Since Project 4 already uses OpenGL 3.3 with shaders, it seemed like a good idea to base my final project off of it and add features to it from Project 3. I ended up doing the opposite by basing my code off of Project 3 and adding code from Project 4 to make it easier to get a successful build. I also needed to figure out how the CMake build system worked through trial and error.
Once I got an successful build, I added an important feature for reloading shaders dynamically instead of having to restart the program all the time to reload new shader code I added. This sped up the development process for the shader writing part.
According to my original plan, I had intended to work on converting the Bounding Volume Hierarchy (BVH) to a GPU friendly format and upload it to the GPU to do BVH traversal inside the shader. But due to the risk of unforeseen problems and possibly needing more time than estimated for this part of the plan, I decided to first work on converting and transferring primitives and lights and porting over the ray tracing code from Project 3 into the fragment shader so that I could still get some visual results even without the BVH.
To convert primitives to a GPU friendly format, I added a method to the Triangle class which is responsible for packing vertex positions, vertex normals, and material information of the triangle into a vector of floats. For spheres a similar method is used where the origin of the sphere and its radius is packed instead. To tell the difference between triangles and spheres apart, an additional struct member is used to signify which shape is packed. Then to upload all the primitives to the GPU, an uploading method is called which iterates over all the primitives and calls the convert method on each primitive and appends each vector of floats to the end of one large vector of floats containing all the primitives processed so far. Finally, the large vector of floats is uploaded as a Texture Buffer Object (TBO) using some example code from Github1. The code allocates a new buffer and texture from OpenGL and calls glBufferData to place the vector of floats in the new buffer that is managed by OpenGL1. Then the texture is bound to the buffer by calling glTexBuffer1. Since the texture is bound to GL_TEXTURE0 using glActiveTexture, the shader uniform for the texture is set to 01. Inside the shader, the data is read by using texelFetch to read a vector of 4 floats at some offset in the texture2. I wrote a shader function that wraps some texelFetch calls so that I can retrieve a entire primitive from the texture given the offset of the primitive. Each primitive is ultimately used in intersection testing.
To convert lights to a GPU friendly format, a process similar to converting primitives is used. A method is used to convert each light into a struct that that will be placed into one large array of light structs. Each struct contains the light’s type, position, direction, radiance, dimensions, and whether it is a point light or not. The array of light structs will be uploaded as a Uniform Buffer Object (UBO). At first, I thought I would need to use raw OpenGL calls to setup the upload of the UBO manually, but I discovered by looking at the NanoGui header file for its shader interface that it already has a class that can manage the buffer for the UBO. So I instantiated an instance of the class and pushed each light struct into it and then passed it to the shader’s setuniform method. On the shader side, the UBO is accessed like a regular array (e.g. u_lights[i] for the i’th light).
The reason why I decided to use a Uniform Buffer Object (UBO) for lights and a Texture Buffer Object (TBO) for primitives is because both types of buffers have different properties3. UBOs are good for small amounts of sequential data3. Since lights are few and the shader iterates sequentially through all the lights to do importance sampling, using a UBO seems like a good idea for lights. On the other hand, TBOs offer a larger amount of storage and is better for random access3. Since the number of primitives can be very large and it is not necessary that the shader would iterate over the primitives in order especially with a BVH traversal implementation, using a TBO seems like a good idea for primitives.
Next up is the porting process of the ray tracer code from Project 3 into GLSL shader code. The process was not too difficult since GLSL and C++ are two very similar languages. I needed to rename types (e.g. Vector3D to vec3, Matrix3x3 to mat3x3) and convert the Object-Oriented form of the code to a more functional form since GLSL does not have classes. The way I accomplished this was to have functions that would take in the struct that they are a part of as their first parameter which is sort of like the hidden this parameter in the methods of classes in C++. One of the functions, I was porting over was a recursive ray tracing function that I needed to convert to an iterative form since GLSL shaders do not allow recursion. I tried to convert the function on my own at first but found it too difficult to do so. I ended basing the new iterative function on a similar ray tracing function I found in the textbook Physically Based Rendering: From Theory to Implementation Third Edition Section 14.5.44. I modified the function slightly because the function in the textbook accounts for the radiance from the light hitting the camera when the ray depth is 0 but in my implementation, the radiance from the light hitting the camera is accounted for outside of the function. Also the textbook function has some lines of code that accounts for reflection and refraction through glass which I removed since those were not implemented in my shader ray tracer.
Finally, I needed to find a way to generate random numbers for the ray tracing process. I researched on the internet various ways to generate random numbers inside a shader. I found on Stackoverflow an example of using a Linear Congruent Generator algorithm in combination with a Taus algorithm to generate pseudorandom numbers based on a seed5. On the CPU side, I generated four random numbers for use as a seed which is passed into the shader as a uniform. But since the seed is the same for all pixels, they will all generate the same random numbers. To fix this, a per pixel seed is added to the passed in seed as shown in this Stackoverflow post6. However, I found this was not enough since the visual output contained some vertical and horizontal lines which was due to the per pixel seed not being good enough. I researched some more and discovered according to another website that the seed should be hashed to generate better quality random numbers7. The website provided a hash function7 which I used, resulting in much better visual outputs without the lines.
Due to project time constraints, BVH conversion, uploading, and traversal were not implemented.
Problems Encountered and Solutions
-
During the merging process of adding Project 3 code to Project 4, I ended up with a lot of compilation errors that were hard to resolve. In hindsight, it was better to base the final project off of Project 3 instead and add code from Project 4 to it since I had to use most of the code from Project 3 anyway and only the OpenGL initialization and shader loading code from Project 4 was needed.
-
I encountered some bugs where the fragment shader was not outputting anything or outputting only part of the scene to the framebuffer. Since it is difficult to communicate data from a shader back to the CPU, I had to debug by outputting colors to the screen that encoded values of specific shader variables I wanted to check. Then I used a color picker program to grab the screen color that was output by the shader and obtained the value encoded by that color to verify that the variable contained the correct value that I expected. Later on, I discovered a program called Renderdoc10 that could display what is being sent to the shader which made it much easier to check that the right data was being sent to the GPU.
-
Another problem, I encountered was when I was trying to send unsigned integers to the GPU which did not make it to the GPU for some unknown reason. They arrived at the GPU as all zeros. I discovered the problem using Renderdoc10 which showed zeros being uploaded to the GPU when non-zero unsigned integers were being sent. I ended up fixing the problem by simply sending unsigned integers as signed integers which made it to the GPU with no issues.
-
The next problem, I encountered was GPU timeouts bringing down the entire computer I was developing on. One time, I accidentally left an infinite loop in the shader code which caused the GPU to get stuck. The GPU driver timed out and all the running programs on my computer that were using the GPU exited. I was left with a blank display. The only way to recover was to restart the computer. The timeout also occurred when I tried ray tracing more complex scenes which understandably would cause a timeout as well since they would take more than a couple of seconds to render especially with the lack of an acceleration structure. To avoid the timeouts, I stuck to rendering simple scenes with a lower number of samples per pixel.
-
The final problem I encountered was when I was trying to implement taking multiple samples for each pixel by using a for loop. The for loop for some reason would cause the spheres scene to become corrupted while the gems scene was fine. I tried many things to fix this including reordering code, setting a fixed maximum iteration count for the loop, and moving some code outside the loop. The only thing that solved the problem was removing the for loop. Then I tried a different computer that had an Intel HD 620 and it all worked fine which led me to conclude that the computer I was developing on with an ATI Radeon 5650m has either a GPU bug or the GPU driver has a bug. I ended up just testing on the Intel HD 620 instead.
Lessons learned
-
Using a debugging tool like Renderdoc10 can help give a lot of insight into what is going on with the graphics pipeline when things are not displaying correctly.
-
If a graphics program does not seem to run correctly, it might not be the code but instead a driver or GPU bug. So try a different GPU first before spending a lot of time rewriting and debugging code.
-
GPU drivers have builtin timeouts for if a shader running on a GPU takes too much time to complete (e.g. stuck in an infinite loop)8. Ray tracing a large complex scene with many samples per pixel and high ray depth will cause the driver to think the GPU has stopped responding when in reality the GPU is just busy. To avoid GPU timeouts it is a good idea to find a way to split up ray tracing into smaller chunks of work so that the GPU has a chance to respond back to driver that it is still responsive and not stuck9.
-
Good random number generation in a shader is challenging. Various pseudorandom number generators found on the internet can be used but it is still possible that nearby pixels might generate similar numbers if the seed per pixel is not good enough. A hash function can be used to hash the seed before it is used to seed the random number generator so that better quality random numbers could be generated7.
-
Using the official GLSL specification was very helpful11.
Results
References
1: https://gist.github.com/roxlu/5090067 ↩
2: https://www.khronos.org/opengl/wiki/Buffer_Texture ↩
3: http://rastergrid.com/blog/2010/01/uniform-buffers-vs-texture-buffers/ ↩
4: http://www.pbr-book.org/3ed-2018/Light_Transport_I_Surface_Reflection/Path_Tracing.html#Implementation ↩
5: https://math.stackexchange.com/a/340028 ↩
6: https://gamedev.stackexchange.com/a/164659 ↩
7: http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/ ↩
8: https://community.khronos.org/t/how-to-crash-a-glsl-shader/62192/3 ↩
9: https://community.khronos.org/t/intensive-shaders-1-second-per-primitive/60537 ↩
10: https://renderdoc.org/ ↩
11: https://www.khronos.org/registry/OpenGL/specs/gl/GLSLangSpec.3.30.pdf ↩
Contributions From Each Team Member
I am the only member of the team and did all of the work for the final project.