TL/DR: After Matt’s original “Swallowing the Elephant” with PBRT, and my own “Digesting the Elephant” with OSPRay, I finally got some “first light” on my “Moana on OptiX” sandbox; not fully done yet, but good enough to at least show a glimpse of. I’ll briefly say something about the voyage I took to get there, and will give a brief overview of what’s currently implemented, and what’s still missing.
Moana on OptiX/RTX: Background
As already mentioned/hinted at last week, I had actually been working on “Moana on OptiX”, on and off, since the very day I joined NVidia… which by now is over two years ago. In fact, there’s more than one such sandbox: at some point I had an OptiX 6 based Moana viewer (no RTX), an OptiX 6.5 one (with RTX), an OptiX prime one (pre-7, obviously), an optix-7-alpha one (way before 7 got released); then another pure CUDA one (with my own CUDA ray tracing core), and even one that ran the same CUDA shading code (with a few #define’s and template magic) on both CUDA, and on the CPU w/ either embree or OptiX prime as a backends … and probably a few more that I now forgot about. I also had a version that used NVLink to split the model over multiple GPUs; one that used managed memory to “page” to the host; and even one that can run mpi-data parallel (with full path tracing!) across diffurent GPUs and/or several different GPU nodes….
Of course, most of those Moana viewers were rather “prototypical” in the sense that they all looked at different aspects of the problem. For example, the CUDA-only version was heavily optimized towards lowest possible memory consumption, etc. Moreover, several of these sandboxes tried to be useful for both Moana and other heavily instanced PBRT models like “landscape” and “ecosys” – but since these all use very different materials, textures, lights, etc the latter ended up being a giant distraction….
All these different sandboxes were all nice and well …. but none of those sandboxes ever went all the way, and not one ever rendered the that model in the way you’d expect it to look: either the respective sandbox didn’t have a path tracer, or it didn’t include the water, or it didn’t use the lights, or the textures, or the curves, or … something that made it “un-showable”. That’s not to say that I didn’t ever have any of these individual components: in fact, I had some reservoir-sampling based direct lighting (from both quad lights and HDR envmap) over a year ago; I also at some point borrowed the Disney BRDF implementation from Will “TwinkleBear” Usher’s ChameleonRT ray tracer (https://github.com/Twinklebear/ChameleonRT); I had baking of the PTex textures in my github pbrtParser repo (https://github.com/ingowald/pbrt-parser) a long while back; Dave Hart gave me his curves tessellation code for the PBRT curves a loooong time ago, etc …. I just never had all of those particular pieces in the same viewer at the same time.
Anyway – I still don’t have all these pieces together right now, but triggered by Chris’ blog post last week I at least sat down and started pulling together all the pieces I still had, and trying to get my Moana back to rendering on a GPU. In particular, I finally sat down and extended my path tracer to also be able to handle water, by pretty much stealing Pete Shirley’s “Dielectric” material from his “Ray Tracing on a Weekend” series (because yes, I have no clue about things like Schlick, Fresnel, etc) …. and as of Saturday night, I finally have some “first light”, including something that’s at least looking like water:
BTW: The above renders quite interactively; currently at something like 25-ish fps on a RTX 8000 (at 2560×1080), and using about 32 out of the 48GBs of RAM. I (obviously) use progressive refinement, so while you move around the image is somewhat more noisy – but by the time you can even click on the screenshot app you pretty much get the above. (I’ll release the code at some point in time, when I cleaned it up a bit).
Moana on the GPU: Main Challenges
As several previous posts/articles have pointed out, there are a multitude of challenges in this model. I’d particularly point to Matt Pharr’s original “Swallowing the Elephant” series; to my own blog post (and accompanying paper) on “Digesting the Elephant”; and to Chris Hellmuth’s recent “GPU-Motonui” blog.
Most of these issues revolve around the sheer amount of data involved in this model, and in particular the data wrangling required to even get it loaded into a form that you can even start rendering it. For the GPU version, however, a few of these particularly stuck out:
- Textures: All the textures in Moana are in Disney’s PTex format …. but there’s no PTex on the GPU, yet (at least, none that I could find). My first versions rendered without textures, but without textures, this model looks really different, as you can see by playing with this fancy new wordpress “image compare” feature:
- Envmap: The envmap not only comes in two forms (EXR for HDR, and png for the default PBRT model), it’s also super-important to exactly match the orientation used in the pbrt file: in particular, the envmap is tilted to the back to account for the fact that the default camera points downwards; if you don’t do that you see the envmap’s lower “ground” half peeking through betwen the ocean and the clouds, which is really disturbing. I had to literally copy pieces of Matt’s PBRT code over to make it match
- Instance count: The model has close on a 40 million instances (even if you make sure to only create those you need ….) – but early versions of OptiX only allowed 16 million per instance BVH. Early versions of OptiX didn’t allow multi-level instancing, either, so at times I had to have three “root” BVHes to trace into serially. Later versions used two-level instancing, which one root IAS over smaller second-level IASes …. which of course asks the question of “how do you best partigion 40 million instances into N groups of less than 16M each …. but I digress – since 7.1 OptiX can do more than 40 million, so this problem is gone.
- Number of tiny meshes: Though the PBRT file contains some really big meshes, there are also a lot (!) of tiny ones. This was an issue mostly for OptiX 6, but even in OptiX 7 this would create a lot of different build inputs and SBT entries. Even worse, for some of the objects one ends up with a few really large meshes plus a ton of tiny ones, all in the same group/BVH…. which caused some other issues.
- Water: The water in this scene is actually particularly tricky: not only do you need a water shader at all, the water is also – in some parts of the model – modeled twice : there’s the main body of water in a giant box (with some low-frequency waves pattern on top), but there’s also a second surface with a higher-resolution tessellation of the waves within the default camera frustum. Now as long as you only use that default camera it’ll be OK, but once you move around, and some pixels refract the water twice, you get some really disturbing pictures.
- Memory: This model is big. Really big.
Moana on RTX: Current State
In its current state, my sandbox does the following:
- Textures: To make the ptex-textures appear on the GPU I currently use some bake-out tool I wrote for exactly that purpose: For each of the input model’s polygon meshes I first re-construct the original quads, and bake out a tiny 16×16 texel “micro-texture” for each of those quads; these then get throws into a larger texture atlas that gets uploaded as a 2k-x-whatever CUDA texture. During rendering, the code material shader then reconstructs the current triangle’s corresponding texture coordinates within its corresponding mesh’s atlas, and uses a cuda bi-linear texture lookup. Total memory for that – at 16×16 texels per quad – is about 3.5GB, which isn’t too bad. I’m sure that this baking out does create some lower texture quality than PTex, but right now I’m pretty happy with it – it also saves a ton of memory (3.5GB vs 40-ish in original PTex files), and is super-fast (’cause I can use texture hardware …).
- Materials: Though I did have some Disney BRDF at some point in time (borrowed from Will’s ChameleonRT) I could never make this work with the water. Currently, I use the material’s “specTrans” value to determine if the material is water or not, and use either Pete Shirley’s RTOW “Dielectric” material (for water), or his “Lambertian” (for everything else). For the water I’m tracking whether I’m already in/out of the water, and just pass straight through any second water surface the path may encounter.
- Lights: I currently use a plain forward path tracer, until the path hits the environment map. All other light soruces get ignores, and even for the envmap I currently use only the LDR version, since that’s what the original PBRT file uses.
- Curves: Curves currently get tessellated into triangle meshes during loading. Since this generates a ton of additional geometry I use a very low tessellation rate, but still can’t see any artifacts from that, likely because the input patches are already pretty small.
- Triangle Meshes: every other geometry in this model is a triangle mesh, anyway. To avoid the many small meshes I currently merge all triangle meshes within a PBRT “object” into a single, large triangle mesh, and store, for each triangle in this model, an index of a tiny struct describing the corresponding sub-mesh’s data. This triangle mesh then gets uploaded to OWL, by creating the proper OWLGeom and OWLTrianglesGeomGroup.
To save on memory I do not use the texcoord and primitiveID arrays from the PBRT file, and simply compute this information on the fly in the CH program.
- Instances: For older versions of OptiX I had to do some extra steps with multiple BVHes and/or multi-level instancing; in the latest version this is no longer required: I simply create a single list of all instances, throw those into and OWLInstanceGroup, and done.
- Model import: I use my github pbrtParser project for all model importing – this library allows to first convert from the ascii PBRT model to binary “.pbf” version (with the exact same data), and loading from this format is few orders of magnitude faster than from the ASCII version … so very useful. Some of the set-up stages then get done right away on the “scene graph” that this library loaded: ie, transforming into a strict single-level instancing model, tessellating the curves into triangle meshes, extracting (and then removing!) the light sources, extracting the default camera pose and screen resolution, etc.
- OptiX use: The actual OptiX usage in my latest viewer is all through OWL: OWL makes it (so!) much easier to deal with things like buffer uploads, launch params, building of data structures, constructing SBTs, etc, that I wouldn’t want to part with it (well, it got written largely for this very purpose…).
In particular, having all the low-level OptiX code “hidden” through owl is a great help in debugging: getting this model wrangled is enough of an issue in itself, so knowing one doesn’t even have to look for bugs in things like setting up build inputs or building SBTs is a huge help.
Other than that, the use of OptiX is pretty straightforward: I create the per-object BLASes and instance accel struct as described above; there’s one closest hit program for the triangle mesh that “deconstructs” the merged-mesh information, and stores primitive ID, instance ID, material ID, texture ID, etc, in the per-ray data. All textures, materials, etc, get first “serialized” on the host (ie, all textures, materials, etc, first all get collected into a single linear array each), then uploaded into an OWLBuffer of the respetive OWL_USER_TYPE(DisneyMaterial), OWL_TEXTURE, etc; these buffers then get attached to to the global LaunchParams from which they are accessible to both CH program and raygen program. All path tracing currently happens in the raygen program.
Moana RTX: What’s (still) Missing
OK, with all this implemented, what’s still missing? A lot, actually:
- Disney BRDF: The current material model I have is “either it’s Dielectric, or it’s Lambertian” – for this particular model that’s actually not too far off, since most of the matrials are actually configured to look pretty much like that. However, it would still be useful to have the full Disney BRDF working.
- HDR Env-map: The env-map should be HDR, but in the code above isn’t – I did have that in the past, but then took the EXR loader out to avoid some windows issues…
- Quad Lights: The model contains a lot of “accent (quad-)lights”, in particular over the beaches; and these give the scene a nice reddish-warm “glow” that makes it look totally different. In the code above the light geometry is already separated from the surface geometry (else they show up as annoying white quads all over the place); but in the code above they’re not yet used. Given the large number of them one has to do some importance sampling, for which I have in the past used some reservoir sampling…. but that code got lost somewhere, so isn’t hooked up right now.
- Next-event Estimation and MIS: Right now the path tracer just traces a ray until it hits a light source; and for the LDR envmap that works just well … but the moment I’ll make that envmap be HDR again that’ll get noisy… so will have to (re-)add some NEE and MIS.
- “Real” Curves: I currently tessellate the curves (palm fronds, mostly) intro triangle meshes; but since OptiX 7.2 can also do curves I should now be able to save that memory.
- Denoising: Currently not yet done – mostly because I’m still not sure whether I should do “the right thing” and first add denoising cleanly in owl, and then get it for free here … or rather to the “quick-hack” and get it done here, first …. we’ll see.
- Animated water: The one piece I’ve never done yet – but in theory, the water in this model is animated…
- More memory squeezing: I currently use about 32GBs of memory for this model – but since last week I got a shiny new 3090, so would obviously like to get that below 24GB. There’s several obvious ways for doing that; in particular the normal arrays can be encoded with way fewer bits, and right now I’m not even freeing the vertex and index arrays after building, which since OptiX 7.1 I could actually do (one can query the hit triangle’s vertices from OptiX).
- Model/material fixes: there’s several objects in the PBRT model that have obviously broken/missing material data. Most of those I’ve now found and eliminated, but for some reason I haven’t yet identified my beach and ocean floor surface are all gray – which may well be the most annoying visual artifact in the current viewer. This may of course be a bug in my parser/importer, but whatever it is, I haven’t found it yet.
Anyway, having worked on this model for so long, it was a really “high”-moment on Saturday night, finally seeing something that looks at least roughly as one’d expect.
I’ll be working on those missing features on and off going forward; will update once I get some of those things working.
6 thoughts on ““The Elephant on RTX” – First Light. (or: “Ray Tracing Disney’s Moana Island using RTX, OptiX, and OWL”)”
Have you tried streaming data to reduce memory? What are the BLAS and TLAS data sizes?
No; haven’t looked into streaming. I had one project a while ago where I was using NVlink to share some of the data (vertices, normals, textures), and when running that across 2xRTX8000 I could fit the model. BLAS and TLAS you have to replicate, but they certianly fit into 48GB (if done right).
I did not look into streaming, though – we did look into that with another paper that just got conditionally accepted to PGV (that paper is on visualizatoin, not on Moana) … but haven’t looked at it in the Moana context – first because I can already fit it with NVlink (albeit across two GPUs that not everybody has), and second because I was rather concentrating on another paper that uses data-parallel rendering, and IMHO is the much more high-impact solutoin to this problem