Float atomics in OpenCL …

OpenCL – the final frontier …. uh… not really, but still: Just stumbled over a situation where I needed some atomic min/max operations for floats (which OpenCL apparently doesn’t have yet?!)… Now I’m sure a lot of people will have way better solutions than what I went with, but since even googl’ing didn’t produce an easy hit I wanted to at least put my solution out here.

Background

Some time last week I finally started doing some experimenting with building BVHs in OpenCL – not exactly my favorite language, but at least one that works across a broad spectrum of hardware architectures. I had written some OpenCL programs before (one of them – wait! – a ray tracer!), but so far had always done only the simple stuff (ie, traversal and shading) on the OpenCL side, and left BVH builds to the host.

Now when playing with BVH build, I  quickly realized that OpenCL – at least, without vendor specific extensions – still doesn’t have floating point atomics… in particular, not the floating point min/max’es that I had wanted to use when different work groups were collectively computing the scene’s bounding box: yes, there’s atomics for integer min/max, but not for floats. Really? In the 20th century?

First attempt to work around that was to simply bit-cast the float to and int, do an integer min/max, and bitcast back – but that gets a bit tricky in the presence of special values (NaN, inf, etc – which your BVH builder shouldn’t actually have to be able to deal with), as well as for “negative zero” values (which unfortunately do happen in practice).

Solution

However, there’s also a very simple solution using just compare-exchange (“cmpexch”), which luckily is available for floats: Simply read the value, do a non-atomic max with the new value, and use cmpexch to write this value back: If the value returned from cmpexch is smaller or equal than “our” local max, then we’re done; but if there was a race condition where another thread wrote a larger value than what we read then cmpexch will return that – we can then do another max with this new value, rinse and repeat, done:

inline void atomic_maxf(volatile __global float *g_val, float myValue)
{
  float cur;
  while (myValue > (cur = *g_val))
    myValue = atomic_xchg(g_val,max(cur,myValue));
}

Pretty simple, actually. Probably pretty expensive, but that’s kind of expected for atomics, anyway, so this should be used as a matter of last resort, anyway.

If anybody has a better solution: Feel free to comment! As said above, OpenCL is not exactly my favorite language, so I’m sure there’s better/more elegant solutions out there …

PS: Yes, atomic add/sub/mul is a bit more tricky, but luckily I don’t need this for BVH builds, so won’t go into that here. At some point I might – just for the fun of it – but for now 🙂

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s