Subject: Re: [boost] [compute] Some questions
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-23 11:29:22
On Tue, Dec 23, 2014 at 1:20 AM, Andrey Semashev
> I have no experience with OpenCL or GPU computing in general, so bear
> with me if my questions sound silly. I have a few questions regarding
> 1. When you define a kernel (e.g. with the BOOST_COMPUTE_FUNCTION
> macro), is this kernel supposed to be in C? Can it reference global
> (namespace scope) objects and other functions? Other kernels?
Yes, the source code for OpenCL kernels and functions is specified in
OpenCL C which is a dialect of C99 with extensions for vectorized
There are a few ways to specific kernel functions which reference
global C++ values. One is the BOOST_COMPUTE_CLOSURE() macro  which
works similarly to BOOST_COMPUTE_FUNCTION(), but also allows a
lambda-like capture list of C++ values.
Another option is to specify your function with extra arguments for
the global objects and then bind them to the function with
> 2. When is the kernel compiled and uploaded to the device? Is it
> possible to cache and reuse the compiled kernel?
If writing a custom kernel, the kernel is built when the
"program::build()" method is called. Internally, the higher-level
algorithms compile programs when they're needed and store them in a
global program cache.
And yes, compiled program and kernel objects can be stored and re-used
(this is strongly recommended). Boost.Compute provides the
program_cache class  which is used stores frequently used programs
as compiled objects.
> 3. Why is the library not thread-safe by default? I'd say, we're long
> past single-threaded systems now, and having to always define the
> config macro is a nuisance.
I would very much like to have it thread-safe by default. This is a
problem however with keeping the library header-only and useable with
C++03 compilers. The BOOST_COMPUTE_THREAD_SAFE macro basically just
instructs Boost.Compute to use the C++11 "thread_local" specifier for
global objects instead of "static". With C++03 compilers, this will
use boost::thread_specific_ptr<> which then requires users to also
link to Boost.Thread.
That said, I still don't think it's ideal and I am very open to
ideas/patches which improve this.
> 4. Is it possible to upload the data to process to the device's local
> memroy from a user-provided buffer, without copying it to
> boost::compute::vector? Same for downloading. What I'd like to do is
> move some of data processing to the GPU while the rest is performed on
> the CPU (possibly with other libraries), and avoid excessive copying.
Yes, that is what the mapped_view class  is for. It maps a region
of host-memory to device-memory and provides a std::vector-like
interface on top of it so it may be used with Boost.Compute algorithms
or custom kernels.
> 5. Is it possible to pass buffers in the device-local memory between
> different processes (on the CPU) without downloading/uploading data
> to/from the CPU memory?
This is not supported by OpenCL (at least not in any standard or
portable way). Memory buffers belong to OpenCL contexts, and contexts
are created per-process without any mechanisms to share them with
If anyone has any experience/ideas with sharing OpenCL contexts
between processes I'd be very interested in trying to get this work.
> 6. Is it possible to discover device capabilities? E.g. the amount of
> local memory (total/used/free), execution units, vendor and device
Yes, the device class  provides a number of methods for returning
information about the device including the generic get_info()
Specifically for those cases you listed you could use:
* Local memory: device.local_memory_size()
* Execution units: device.compute_units()
* Vendor name: device.vendor()
* Device name: device.name()
Thanks for the questions. Let me know if I can explain anything better.