Subject: Re: [boost] [compute] review
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-17 01:17:12
On Tue, Dec 16, 2014 at 11:20 AM, Sebastian Schaetz
> Here is my review of Boost.Compute:
Thanks for the review! I left my responses in-line below. Let me know
if I missed anything.
> 1. What is your evaluation of the design?
> The library is based upon OpenCL, a cross-platform cross-device open
> standard that abstracts access to and provides a programming model for
> many-core vector co-processors such as GPUs. These co-processors are usually
> referred to as "devices".
> The library provides a wrapper layer around the OpenCL C interface. It skips
> the standard OpenCL C++ wrapper which I don't consider a problem because
> except for destructors there is no added value in using this wrapper. In my
> opinion Khronos should adopt Boost.Compute as their C++ layer for OpenCL.
> Boost.Compute provides compatibility with the OpenCL C interface through
> conversion operators that decay Boost.Compute types to their OpenCL C
> equivalents. This can be quite useful.
> On top of this wrapper Boost.Compute exhibits 3 core components:
> * types to interact with and issue commands to devices:
> these follow OpenCL concepts but are not necessary if defaults are used
> * means of managing memory (allocate, copy) on devices:
> this component contains also asynchronous operations which I consider
> essential in a library that deals with co-processors
> * a collection of parallel primitives and meta-functions with an STL interface:
> this components contains powerful iterators to combine containers and
> algorithms to implement more complex algorithms in an efficient manner
> One thing I'm not clear about is how asynchrony is handled. Command queues
> are exposed, issuing commands to different queues is a way to express
> concurrency. At the same time copy_async returns a future which is another
> way of exposing concurrency.
> It is out of the scope of Boost.Compute to solve the challenges of
> asynchronous/concurrent operations because it is a different and difficult
> topic not yet solved for C++ in general either, but at least the
> documentation should be more explicit about which commands are executed
> when, which commands are synchronous, which are asynchronous and what is the
> role of the command_queue in this regard.
Yeah, I should have documented this better. As a general rule, the
Boost.Compute algorithms execute asynchronously with respect to the
host. The algorithms operate by queuing up operations (e.g. kernels
launches) to be executed on the device via the command queue (which is
handled by OpenCL). So, for example, executing the "transform()"
algorithm on a vector on the device will occur in parallel to any
further code run on the host (at least until making another OpenCL
call which leads to a synchronization point between the host and
device, e.g. "clFinish()"). The exception to this rule is that any
algorithm which read/modifies host-memory (such as the "copy()"
algorithm with host and device iterators) will block until the
operation is complete. I chose to implement Boost.Compute this way in
order to eliminate any potential race-conditions between the device
writing to host memory and the host code using that same memory
without synchronizing. This is the reason I introduced the
"copy_async()" algorithm which makes its asynchronous nature explicit
and requires the user to synchronize (via wait() on the returned
future or finish() on the command queue) themselves before attempting
to read the modified memory.
I am still looking to improve Boost.Compute in this area and also
paying close attention new techniques (such as those in the
Concurrency TS). Any ideas/proposals/thoughts on this would be greatly
> 2. What is your evaluation of the implementation?
> I did not evaluate the implementation in detail but looked at a few of the
> tricks Boost.Compute uses to generate kernels. The implementation of this
> part of the library is good and instructive.
> 3. What is your evaluation of the documentation?
> Boost.Compute documentation is of excellent quality. The recent addition of
> performance data is helpful. I could not find any documentation about fancy
> iterators, this should probably be added. Also, it would be great if the my
> questions regarding asynchrony/concurrency could be addressed in the
Addressed above. I'll work on updating the documentation to explain
this more thoroughly.
> 4. What is your evaluation of the potential usefulness of the library?
> Boost.Compute is extremely useful. With this library a developer familiar
> with the STL can utilize the processing power of GPUs without any knowledge
> of vector co-processor programming. The documentation shows that for large
> vector sizes, some Boost.Compute algorithms outperform the STL by an order
> of magnitude.
> 5. Did you try to use the library? With what compiler? Did you have any
> I tried the unit-tests on a 8x GeForce Titan system without any problems and
> on a ARM Mali GPU with some unit tests failing. I'll be working with the
> library author to fix tcomp.lib.boost.develhe problems in these unit tests.
> I used gcc 4.8.2 for the tests on both GeForce and Mali.
Thanks for testing these! Hopefully it'll be quick to fix the issues
on the Mali GPU.
> 6. How much effort did you put into your evaluation? A glance? Aquick
> reading? In-depth study?
> I reviewed the library a few months ago in-depth and reread the
> documentation for this review as well as ran some unit tests.
> 7. Are you knowledgeable about the problem domain?
> My job involves working with both CUDA and OpenCL. Furthermore I am the
> author of the Aura library  a similar, albeit lower level library for
> 8. Do you think the library should be accepted as a Boost library?
> I think the library should be accepted into Boost. The interface is simple
> and easy to understand for non-experts and the benefits of using this
> library can be significant.
> I'd like to add that Boost.Compute represents one level of abstraction for
> accelerator programming. I'd like the Boost community to keep an open mind
> when it comes to different levels of abstraction, either lower (i.e. my Aura
> library) or higher (i.e. VexCL). Libraries with different levels of
> abstraction can coexist, be compatible with one another or could even build
> upon one another.
Thanks! And I also look forward to having a larger ecosystem of
GPU/accelerator programming libraries.