As mentioned previously I wanted to use this blog to share little nuggets on “how to do things” with tools such as ISPC – little things that are useful, but not easily publishable in any form other than a blog post.
In the first such article (on the ISPC front) I wrote a bit about the different aspects of addressing modes (and their caveats) in ISPC, which is something we constantly stumbled over / used in our OSPRay project. In this article, I’ll write a bit about something any non-trivial project will at some point stumble over: calling back-and-forth between ISPC and C++.
Typically, the reason we want to mix ISPC and C/C++ at all is that ISPC is good at expressing (low-level) parallel code for vectorization, but a little bit less good at building more complex software systems that nowadays use all the goodies of C++, such as templates, virtual functions, copious amounts of libraries (often heavily based on templates, too), etc… all things that ISPC – very unfortunately – lacks. As such, the typical set-up for most ISPC programs I’ve seen so far is that there’s a “main application” that does all the “big systems” stuff (I/O, parsing, data structure construction, scene graph, etcpp) in C++ – and smaller “kernels” of the performance-critical code written in ISPC. Obviously, those two parts have to communicate.
The obvious part: Calling ISPC from C/C++
Now the first, and most common, operation involved in that is having the C/C++ program call into ISPC, to launch and/or otherwise execute those kernels (I’ll say a bit more on “launching” vs “executing” below). Luckily, that part is easy, even has some language support (the “export” keyword), and is well documented: Basically, you declare an ISPC function as “export <function>”; have ISPC generate a header file with a C-callable function defitinition of that (the “-h” flag in ISPC invocation”; include the thus-generated header-file in your C/C++ code, and simply call that function.
Even with that simple way, there’s a few caveats: In particular, though ISPC can export the function declaration to C/C++, it can’t easily export all the parameter types that this function may expect as parameters – in particular if those parameters include compound types (structs), varying types (ugh, it really doesn’t like that), an implicit active mask, functions pointers with varying arguments, etcpp.
Example 1: Getting our feet wet
To start with, let’s look at a simple case, a ISPC kernel that operates on a float array (it doesn’t matter what it does). In this case, we’d simply declare the ISPC side (in foo.ispc) as
export void foo_simple(float *uniform A, uniform int N) { ... }
Once we compile this function with the “-h” argument, ISPC will emit a C callable function (with “extern C” linkage, decared in foo_ispc.h), with the following signature:
/* in foo_ispc.h */ namespace ispc { extern "C" void foo_simple(float *A, int N); }
Obviously, calling this function from C++ is trivial. Worst that could happen – and usually does happen in many programs – is that the data on the C++ side isn’t actually stored as a plain C-array, but rather in a std::array or std::vector, but even that can easily be fixed with a bit of typecasting:
/* in foo.cpp */ #include "foo_ispc.h" void fooOnVec(const std::vector<float> &vec) { ispc::foo((float *)&vec[0],(int)vec.size()); }
Of course, any C++ purist with a drop of blood left in his veins will cry out at this abuse of C++ : A std::vector should be opaque, we shouldn’t be assuming that it internally stores this as a float array, just getting the base pointer by taking the address of the first element is devil’s work, and don’t even get me started on casting a pointer to a const vector’s internal storage to a non-const float array. But hey, this is about performance, not cleanliness, so I’ve intentionally chosen this example just to make this point: In many cases you will have to type-cast from “better” C++ types to something that’s more easily digestible in ISPC. Of course, the cleaner way would have been to first copy from that std::vector to a temporary array, and call that kernel on that array – but calling into ISPC is all about speed, so there we are (and if speed is not the goal: just write all the code in C++ to start with!).
Example 2: Structs as function arguments
The reason the previous example was so simple is that it used only built-in types like “float” and “<pointer>” (and those in purely uniform variability) that ISPC can map one-to-one into the generated header file. For compound types and/or varying it’s a little bit more tricky, because typically ISPC and C++ can not include the same header files (except in trivial example), and the compound types that ISPC exports are not always very useful on the C++ side. For example, consider making out example a little bit more tricky:
/* in foo.ispc */ /* declare a three-float vector type */ struct vec3f { float x, y, z }; /* export a kernel that operates on an array of those */ export void foo(uniform vec3f &v) { .. }
In this example, all parameters are still purely uniform; however, there’s already a struct type involved. Now ISPC can already export this struct (generating a “ispc::vec3f” struct with float members x,y,z as above), and the C++ code can already use this. But generally, this “pure C99” struct wouldn’t be much use in C++ (typically a C++ version of a vec3f would have all kinds of methods associated with it that we can’t add in ISPC), so more likely, we’ll also be using some “class vec3f” on the C++ side. Nevertheless, if we do design our ISPC and C++ classes such that they do have exactly the same memory layout (e.g., in this example, as “three floats, called x, y, z, right after each other”), then we can once again simply typecast:
/* in foo.cpp */ #include "foo_ispc.h" namespace cpp { class vec3f { /* all kinds of C++ goodies */ float x, y, z; } void fooPlusPlus(const ours::vec3f &vec) { /* just typecast the reference type, and call ISPC */ ispc::foo((const ispc::vec3f &)vec); } }
And lo and behold, everything works as expected. Again, a C++ purist might object – but that’s the kind of pattern we’re using in OSPRay all over the place, and so far it’s been very useful. Do note, though, that we do typecast the reference types – this basically tells the compiler that it doesn’t even have to know how to “convert” from one type to the other, and that it’s simply a pointer for which we guarantee the right underlying data layout.
The caveat, of course, is exactly what every C++ purist will cry out about: There is no type-guarantees in that at all that the compiler can even give you a warning about: If you change the order of the members on the C++ side without also doing that on the ISPC side (or vice versa), or simply add a member on one side but not the other … or do anything else that changes the memory layout on one side but not the other …. well, then you’ll get funny results. Be warned.
Example 3: What about varying argument types?
Oh-kay, in the previous examples we’ve passed only uniform types, but what about varying ones? Answer: generally speaking, you won’t need it. This sounds counter-intuitive (after all ISPC is all about varying types), but isn’t: typically it’s only the ISPC side that “creates” the varying types, so if anything you’d want to pass a varying type from ISPC to C++ (see below), but not the other way around.
Now if you still think you have to do that, you’ll have to go the way of arrays – ISPC will simply refuse to emit an export header for anything that contains an actual varying type. For the curious: There’s actually no reason it can’t do that – my own version of a SPMD compiler (called IVL, and at one point ISPC++) could actually do that, by exporting a struct of appropriate size …. but it’s simply not implemented in ISPC, so don’t even try it. Unfortunately, the same goes even with “indirect” references to varying types. For example, in OSPRay we make heavy use of “poor mans C++ in ISPC” with function pointers and stuff – and though the classes we use – and the function pointers therein – are perfectly uniform every time we would even want to export them, the uniform function pointers inside those uniform classes often have varying parameters, so currently ISPC can’t export them (IVL/ISPC in these cases wouldn’t even have emitted the varying types themselves, only forward declarations). Either way – don’t export varying types, or those with function pointers with varying arguments, and you’ll be fine – and as said above, in 99.9% of all cases, you probably don’t need this, anyway.
Now, how about the other way around? Calling C/C++ functions from ISPC?
While the above way of calling from C/C++ into ISPC is pretty well documented, significantly fewer people realize that the other way around works perfectly well, too: ISPC obviously can’t directly call any C++ functions – it would have to understand all the mangling (and in almost all cases, C++ types) to do so …. but what it can do perfectly well is call functions with extern C linkage, no matter what language was used to generate them. I.e., you absolutely can do the following:
/* in foo.cpp */ extern "C" void someCppMagic() { std::ifstream .... /* whatever C++ code you want */ }
Then, once declared as “extern C” in ISPC, we can call this function:
/* in foo.ispc */ /* _declare_ the C++ function (with extern C linkage */ extern "C" { unmasked void someCppMagic(); } /* now call from another ispc function */ void myIspcKernel(varying ....) { someCppMagic() }
Now in this example, there’s two little details that aren’t immediately obvious, but important: First, on the ISPC side we have used a
extern "C" { ... }
rather than the simpler
extern "C" ...
… and though that looks mondane, it’s actually important, since the ISPC parser can digest the former, but – for whatever reason – can’t digest the latter. So using the former is more cumbersome, but actually important to make it work.
Second, you may or may not have noticed the “unmasked” keyword in the ISPC declaration of “someCppMagic()”. In this particular example this is actually superfluous, but in some more complex ones it isn’t, so let me explain: In ISPC, all the “varying” control flow is handled implicitly, meaning ISPC keeps track which “programs” (lanes) in a “program group” (aka warp, work group, SIMD vector, etc) are active or not. It does that fully implicitly, without the user seeing it – but it obviously does have to pass this information between (non-inlined) functions, which it does by always putting an additional, completely implicit, parameter on the stack of each function call (and of course, that parameter contains the active mask).
Now in our example the C++ side doesn’t expect any parameters, so even if it was on the stack it wouldn’t hurt, since the C++ side would simply ignore it. If the function did have any paramters, however, putting the “unmasked” keyword is actually pretty important, since the C++ compiler would assume one sort of data on the stack (i.e., no active mask), but ISPC would put both the “real” arguments and the active mask there… and that can lead to rather funny results (believe me, we did stumble over that in both OSPRay and Embree ….).
Now, what about passing stuff?
Now just like in the “C++ to ISPC” example, the real fun starts once we want to pass arguments from ISPC back to C. In its simplest form, we can do this by simply using foreach_active to serialize over all active programs, and calling a (scalar) C/C++ function with it:
/* foo.ispc */ extern "C" { unmasked void fooScalarCpp(uniform float f); } void myIspc(varying float f) { foreach_active(uniform_f in f) fooScalarCpp(uniform_f); }
And yes, that’ll work perfectly well.
More often, however, you actually do want to pass varying data to C/C++, so let’s talk about how to do that. Just as in the C-to-ISPC example, uniform basic types are trivial, as are pointers/references … and varying ones are.
First, the simple stuff – varying built-in types
For built-in types (float, int, …) the easiest way of passing them in varying form is actually as arrays:
/* in foo.ispc */ extern "C" { unmasked void myCppFunc(float *uniform, uniform int); } void myIspcFunc(varying float &f) { myCppFunc((float *uniform)&f, programCount) }
… and everything works fine.
Now what about compound types?
Now where it does get tricky is varying compound types, because these get internally stored in struct-of-arrays (SoA) form, but must typically be converted to array-of-structs (AoS) before C++ can digest them. For example, the typical way to do so is the following:
extern "C" { unmasked void myCpp(vec3f *uniform, uniform int); } void myIspc(varying vec3f &v) { uniform vec3f vInAos[programCount]; vInAos[programIndex] = v; myCpp(&vInAos[0],programCount); }
In this example, we first use ‘programIndex’ and ‘programCount’ to transform from SoA to AoS, then call the C++ side function with this array of (uniform) structs, just like above (note that we do not have to typeast when calling from ISPC to C++, because we did the declaration of the function with ISPC types … there is no “foo_cpp.h” to include).
Of course, just like we converted data on the way “out” to C++, so we’ll have to do once again if that C++ function modified that data. Also “of course”, this conversion is pretty horrible performance-wise, so “use it wisely” is all I can say.
What about active/inactive lanes?
One thing all the previous examples had in common is that they all assumed that the C++ function would always want to process all the lanes, or at least, that it was oblivious of the fact that some of the program instances might not actually be active – in fact, I even harped on the importance of the “unmasked” keyword to intentionally not pass any active masks. In practice, however, we often do have to pass information which lanes are vs are not active… so how to do that?
The first thing that comes to mind is to not use the “unmasked” keyword on the ISPC side, and then, on the C++ side, simply declare this additional parameter explicitly – after all, ISPC will put this “varying bool” on the stack, so if we only declare it properly, the C++ side can use it. Only one answer: Don’t do it. Of course, we actually did at first, and it does in fact “kind of” work – for a while. However, ISPC doesn’t guarantee the form that this parameter takes on different ISAs, so code that may work perfectly fine on AVX may lead to “funny results” when run on AVX512 … which we had to learn the hard way, of course. As such, don’t do it.
Instead, the better way is to create an explicit array of active bits, and pass this the same way as any other array (ie, with “unmasked” enabled). Here’s an example:
/* foo.ispc */ extern "C" { unmasked void fooCpp(float *uniform v /* the data */, int *uniform active /* explicit active mask */, int uniform N); } void fooIspc(varying float f) { /* the same stuff we did above: */ uniform float fArray[programCount]; fArray[programCount] = f; /* now "construct" an active mask - mind the 'unmasked'!!!! */ uniform int activeMask[programCount]; unmasked { activeMask[programIndex] = 0; } activeMask[programIndex] = 1; fooCpp(.....); }
Once again, there’s one hidden “imporant caveat” in this example that is easily overlooked, and which I’ll therefore call out explicitly here: the second “unmasked {}” statement, when we construct the active mask array. Ie, in particular those two lines:
unmasked { activeMask[programIndex] = 0; } activeMask[programIndex] = 1;
This may look particularly awkward, but is actually important: If we only used the second of those lines, the semantics of ISPC do not actually say what the values of the inactive lanes would be: Yes, the active lanes would be 1, but the inactive lanes may be something other than 0, which will, once again, give “funny results” on the C++ side. Even worse, in most cases these lanes will actually get initialized to 0, and work just fine – but as I said, there’s no guarantee for that, and I’ve seen cases where they weren’t …. so this one is the “correct” way of doing it: The first line tells ISPC to set all lanes to zero, becuase the “unmaksed{}” around the assignment forces all lanes to active, thereby all lanes will get set to 0. Then, the second statement will revert to only the active lanes, and set (only) those to 1, which is exactly what one wants.
One other observation in this: I’ve actually passed the active lanes as an array of ints rather than an array of bools – you can do either, of course, but ints are often easier to digest, so I always use ints.
Last “Trick”: Calling C++ virtual functions from ISPC …
Taken to the extreme, you can even use the above things to call “real” virtual functions (on the C++ side) from the ISPC side. Imagine, for example, that you have a base class “Geometry” with a virtual function “commit”, and a derived “TriangleMesh”, derived from Geometry:
/* in geom.cpp */ class Geometry { .... virtual void commit() = 0; }; class TriangleMesh : public Geometry { ... virtual void commit() override { ... } };
Now further assume, we have at some point created an instance of such a geometry (say, a triangle mesh) on the C++ side, and have passed a poitner to this geometry to the ISPC side, for example such as this:
void Scene::doSomethingWith(Geometry *geom) { ispc::doSomethingInIspc((void *)geom); }
(note how to typecast “geom” to a “void *”, because ISPC doesn’t know anything about these classes).
Now further, let’s assume “somethign” on the ISPC side has this “void *” that we know points to a geometry, and wants to call this geometry’s virtual commit() method. Directly calling this isn’t possible, because ISPC of course doesn’t understand any of this. However, we can create a simple “hook” for it – on the C++ side – that will allow that, say:
/* in geometry.cpp */ extern "C" callGeometryCommit(void *ptr) { Geometry *g = (Geometry *)ptr; ptr->commit() }
All this function does is de-ambiguate the void poitner back to a geometry, and call the respective method. And since it has extern C linkage an only plain C data types (void *), you can perfectly well call it from ISPC:
/* foo.ispc */ extern "C" { unmasked void callGeometryCommit(void *unform); } void doSomethingInIspc(void *geometry) { .... callGeometryCommit(geometry); ... }
…. and la-voila, we’ve had ISPC call a virtual function. Not so hard, is it? And if you do have a varying pointer of geometries, you can of course serialize this via foreach_unique, and call “callCommit” for each instance individually).
Of course, the downside of this method is that there’s two function calls – first the ISPC-to-C function call (which can’t be inlined), and then the actual virtual function, which again is a function call. So, some overhead … but definitely useful to be able to do it at all, so I do hope this’ll be useful to those that always wanted to mix and match ISPC and C++ more than is demonstrated in the simpler examples.
Anyway – this article turned out to be way longer than imagined – or intended – so for today that’ll be all. As said above I do hope it’ll be useful reading for all those that do set out to do some more non-trivial stuff with ISPC. Much of the above took some experimentation to figure out – but we do use all of those patterns throughout OSPRay, and so far they work fine. As such: Hope this has been helful – and as always: feel free to comment if I’ve done something wrong, or un-necessarily hard ….