Boost logo

Boost :

Subject: Re: [boost] Interest in a GPU computing library
From: Manjunath Kudlur (keveman_at_[hidden])
Date: 2012-09-18 14:15:47

On Tue, Sep 18, 2012 at 9:28 AM, Kyle Lutz <kyle.r.lutz_at_[hidden]> wrote:

> Thanks for all the comments and feedback so far! I’ve written up
> answers to your questions below (which should serve as a good start
> for a FAQ for the library). Please let me know if anything is not
> clear or if I forgot to answer your question.
> *** Where can I find the code and/or documentation? ***
> I have not yet made the code publicly available. I still want to clean
> up a few things and improve the documentation a fair bit before
> releasing it. This e-mail was just to gauge the interest of the Boost
> community in this type of library (and it seems to be positive :-)).
> As long as I find some free time it should only take a week or so to
> get the code online. I will notify the list when I do so.
> *** Why not write as a back-end for Thrust? ***
> It would not be possible to provide the same API that Thrust expects
> for OpenCL. The fundamental reason is that functions/functors passed
> to Thrust algorithms are actual compiled C++ functions whereas for
> Boost.Compute these form expression objects which are then translated
> into C99 code which is then compiled for OpenCL.
> *** Why not target CUDA and/or support multiple back-ends? ***
> CUDA and OpenCL are two very different technologies. OpenCL works by
> compiling C99 code at run-time to generate kernel objects which can
> then be executed on the GPU. CUDA, on the other hand, works by
> compiling its kernels using a special compiler (nvcc) which then
> produces binaries which can executed on the GPU.
> OpenCL already has multiple implementations which allow it to be used
> on a variety of platforms (e.g. NVIDIA GPUs, Intel CPUs, etc.). I feel
> that adding another abstraction level within Boost.Compute would only
> complicate and bloat the library.
> *** Is it possible to use ordinary C++ functions/functors or C++11
> lambdas with Boost.Compute? ***
> Unfortunately no. OpenCL relies on having C99 source code available at
> run-time in order to execute code on the GPU. Thus compiled C++
> functions or C++11 lambdas cannot simply be passed to the OpenCL
> environment to be executed on the GPU.
Using a DSL to specify a function object, and transforming that to a C99
string to pass down to the OpenCL driver is a nice idea. But it gets really
ugly when the function object is anymore complex than _1+_2. I tried this
once before, look here :
I wouldn't want to write my function objects like that.

> This is the reason why I wrote the Boost.Compute lambda library.
> Basically it takes C++ lambda expressions (e.g. _1 * sqrt(_1) + 4) and
> transforms them into C99 source code fragments (e.g. “input[i] *
> sqrt(input[i]) + 4)”) which are then passed to the Boost.Compute
> STL-style algorithms for execution. While not perfect, it allows the
> user to write code closer to C++ that still can be executed through
> OpenCL.
> *** Does the API support data-streaming operations? ***
> Yes it does. Though, as a few people pointed out, the example I
> provided does not show this. Each line of code in the example will be
> executed in serial and thus will not take advantage of the GPU’s
> ability to transfer data and perform computations simultaneously. The
> Boost.Compute STL API does support this but it requires a bit more
> setup from the user. All of the algorithms take a optional
> command_queue parameter that serves as a place for them to issue their
> instructions. The default case (when no command_queue is specified) is
> for the algorithm to create a command_queue for itself, issue its
> instructions, and then wait for completion (i.e. a synchronous
> operation).
> The example can be made more efficient (though slightly more complex)
> as follows:
> // create command queue
> command_queue queue(context, device);
> // copy to device, sort, and copy back to host
> copy(host_vector.begin(), host_vector.end(), device_vector.begin(), queue);
> sort(device_vector.begin(), device_vector.end(), queue);
> copy(device_vector.begin(), device_vector.end(), host_vector.begin(),
> queue);
> // wait for all above operations to complete
> queue.finish();
> *** Does the Boost.Compute API inter-operate with the OpenCL C API? ***
> Yes. I have designed the C++ wrapper API to be as unobtrusive as
> possible. All the functionality available in the OpenCL C API will
> also be available via the Boost.Compute C++ API. In fact, the C++
> wrapped classes all have conversion operators to their underlying
> OpenCL types so that they can be passed directly to OpenCL functions:
> // create context object
> boost::compute::context ctx = boost::compute::default_context();
> // query number of devices using the OpenCL C API
> cl_uint num_devices;
> clGetContextInfo(ctx, CL_CONTEXT_NUM_DEVICES, sizeof(cl_uint),
> &num_devices, 0);
> std::cout << “num_devices: “ << num_devices << std::endl;
> *** How is the performance? ***
> As of now many of the Boost.Compute algorithms are not ready for
> production code (at least performance-wise). I have focused the
> majority my time on getting the API stable and functional as well as
> implementing a comprehensive test-suite. In fact, a few of the
> algorithms are still implemented serially. Over time these will be
> improved and the library will become competitive with other GPGPU
> libraries. On that note, if anyone has OpenCL/CUDA code that
> implements any of the STL algorithms and can be released under the
> Boost Software License I'd love to hear from you.
> Thanks,
> Kyle
> _______________________________________________
> Unsubscribe & other changes:

Boost list run by bdawes at, gregod at, cpdaniel at, john at