Device/GPU side sorting in OpenMP (ie, with #prama omp target)

I’ve been playing a bit with OpenMP recently – in particular, with the pragma omp target based device offloading that OpenMP 5.0 and newer are offering. Overall I really like it, but one of the things I found is that whereas classical GPU languages have lots of helper libraries for common things like sorting, this isn’t (yet?) the case for OpenMP target offloading. Of course, you can also just sort on the host my properly mapping the data, but for those that approach omp target offloading in the same way they would with CUDA this doesn’t feel exactly right.

So, bottom line: For some OpenMP BVH builder I was writing I realized I needed an OpenMP based sorter, and not finding one I just took an existing CUDA-based bitonic sorter that I had lying around, and ported it over to OpenMP. Not the fastest way of device-side sorting, maybe, but super flexible because it works for any operator<(key_t,key_t) comparable data type, trivial to extend to key/value sorts, workable for any size and type of input data, etcpp. And since that code might also be useful for others I just put it into its own repo, too: https://github.com/ingowald/openmp_target_sort/

To make this look somewhat like the existing omp_target_alloc() etc I called the function omp_target_sort() (mostly to make it clear to whoever calls it that the key and/or value array(s) have to be device-side data). The code is fully templated, and header only; so should be easy to use in any codebase, and with any data type. I have not yet bothered adding a custom comparator based version, but that should be easy to add if/when required (if so, let me know).

With that: enjoy….

Leave a comment