Huh, how fitting: “Ray Tracing on a Weekend“, and I’m sitting here, Sunday morning, over a coffee, and writing about ray tracing on a weekend … on a weekend. And if that wasn’t recursive enough, I’m even writing about recursion in ….. uh-oh.
Aaaaanyway. For reference, I also just added a purely iterative variant of the “RTOW-in-OptiX” example that I wrote about in my previous two posts: The original code I published Friday night tried to stay as close as possible to Pete’s example, and therefore used “real” recursion, in the sense that the “closest hit” programs attached to the spheres did the full “Material::scatter” of its respective material (lambertian vs dielectric vs metal), plus doing a recursive “rtTrace()” to continue the path, thus doing some real recursive ray (actually: path) tracing.
Now if you read the previous section very closely you may have seen that I put “real” in quotes, for good reason: OptiX will internally re-factor that code to not really recurse in the way Pete’s CPU version did – with very deep stack and everything – but will likely do something more clever by re-factoring that code, which you can read more about in the original OptiX SIGGRAPH paper.
All that said, no matter what OptiX may or may not do with it, from a programmer’s standpoint it’s true recursion …. and though OptiX may do some refactoring to avoid the “gigantic stacks” problem – it’ll still have to do something to handle all the recursive state – and that, of course, is not cheap. Consequently, real recursion is generally something to be avoided (which, BTW, typically makes the renderer simpler to argue about, anyway).
Roger Allen’s CUDA-version already did this transformation, and used a recursive version: Since his example used CUDA directly, there was no way for any compiler framework to re-factor the code, so if he had used recursion the CUDA compiler would really have had to use enough stack space per pixel to store up to 50 recursive trace contexts, which would probably not have ended well.
In my original OptiX example, I didn’t have this problem, and could trust OptiX to handle that recursion for me in a reasonable way. Nevertheless, as said above real recursion is usually not the right choice to go about it (and BTW: on a CPU it usually isn’t, either!), so the downside of my staying close to Pete’s original solution was that this originally example might actually have led some readers to think that I wanted them to write such recursive code, which of course is not what I intended.
As such, for reference, I just added a iterative version to my example as well. The particular challenge in this example is that while the CPU and CUDA versions have real “Material” classes with real virtual functions, in OptiX it’s a bit tricky to attach real virtual classes to OptiX objects (yes, you can do it – after all, programs are written in general CUDA code – but let’s not go there right now). For my particular version, the way I went about this is to have the closest hit programs do one Material::scatter() operation for the material associated to that geometry, and return the resulting scattered ray and attenuation back to the ray generation program via the PRD. Of course, this approach works only because the Material in Pete’s code does only exactly one thing – scatter() – and wouldn’t have worked if we the ray generation program would have had to call multiple different material methods … but hey, this example is not about “how to write a complex path tracer in OptiX” – that may come at a later time, but for now, this is only about how to map Pete’s example, nothing more.
I do hope the reference code will be useful; and as usual: any feedback is welcome!
With that – back to …. work?
PS: For those interested in having a look: I already pushed the code to github (https://github.com/ingowald/RTOW-OptiX). I’ll be running some more extensive numbers when I’m back to a real machine (no, I don’t bring my turing to my sunday-morning coffee…), but at least on my “somewhat dated” Thinkpad P50 laptop, I get the following (both using 1200x800x128 samples):
- pete’s version (with -O3, and excluding image output), on a Core i7-6700HQ@2.6Ghz(running at 3.2Ghz turbo): 12m32s.
- optix version, on a Quadro M1000M: 18 sec.
Of course, this comparison is extremely flawed: Pete’s version doesn’t even use threads, let alone an acceleration structure, both of which my OptiX version does. Take this with a grain of salt – or an entire salt-trucks worth of it, for that matter! That said, the parallelism in the OptiX version comes for free, and the acceleration structure …. well, all that took was adding a single line of code (‘gg->setAcceleration(g_context->createAcceleration(“Bvh”))‘) …
PPS: First performance numbers on some more powerful card (driver 410.57, optix 5.1.1):
- 1070, recursive: 0.58s build, 6s render
- 1070, iterative: 0.66s build, 5.5s render
- Titan V, recursive: 0.57s build, 2.6s render
- Titan V, iterative: 0.63s build, 2.1s render
- Turing: to come…