Boost logo

Boost :

Subject: Re: [boost] [compute] Some questions
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-23 16:21:56


On Tue, Dec 23, 2014 at 12:55 PM, Andrey Semashev
<andrey.semashev_at_[hidden]> wrote:
> On Tue, Dec 23, 2014 at 7:29 PM, Kyle Lutz <kyle.r.lutz_at_[hidden]> wrote:
>> On Tue, Dec 23, 2014 at 1:20 AM, Andrey Semashev
>> <andrey.semashev_at_[hidden]> wrote:
>>>
>>> 1. When you define a kernel (e.g. with the BOOST_COMPUTE_FUNCTION
>>> macro), is this kernel supposed to be in C? Can it reference global
>>> (namespace scope) objects and other functions? Other kernels?
>>
>> Yes, the source code for OpenCL kernels and functions is specified in
>> OpenCL C which is a dialect of C99 with extensions for vectorized
>> operations.
>
> Does this mean that the compiler has to support OpenCL in order to be
> able to use Boost.Compute? Or its specific features? If yes, can this
> be mentioned in the docs (with the list of the affected features, if
> possible)?

No, Boost.Compute does not require any special compiler or compiler
extensions. It will work with all standards-conforming C++03 and later
compilers.

> Also, I don't quite understand, how the kernel source code which I
> supply to BOOST_COMPUTE_FUNCTION is then compiled into kernel. Is this
> source code just stringized and not actually compiled when the
> application is built?

Yes, the source argument for BOOST_COMPUTE_FUNCTION() is stringized
and then inserted into a OpenCL program when "invoked" by an
algorithm. And you're right, the function source is not compiled by
the host-compiler, though the function signature itself is which gives
us some degree of type-safety.

>> There are a few ways to specific kernel functions which reference
>> global C++ values. One is the BOOST_COMPUTE_CLOSURE() macro [1] which
>> works similarly to BOOST_COMPUTE_FUNCTION(), but also allows a
>> lambda-like capture list of C++ values.
>>
>>> 2. When is the kernel compiled and uploaded to the device? Is it
>>> possible to cache and reuse the compiled kernel?
>>
>> If writing a custom kernel, the kernel is built when the
>> "program::build()" method is called. Internally, the higher-level
>> algorithms compile programs when they're needed and store them in a
>> global program cache.
>>
>> And yes, compiled program and kernel objects can be stored and re-used
>> (this is strongly recommended). Boost.Compute provides the
>> program_cache class [3] which is used stores frequently used programs
>> as compiled objects.
>
> So, e.g. a kernel defined with BOOST_COMPUTE_FUNCTION will be compiled
> when first used, and then saved in some global program_cache, is that
> correct? Also, captured arguments of BOOST_COMPUTE_CLOSURE will be
> evaluated only once, when the kernel is built?

Yeah, the algorithms in Boost.Compute will create a program with the
function's source and then store it in the global program cache for
later use.

And captured values with BOOST_COMPUTE_CLOSURE() are stored by
reference and are updated if the corresponding C++ values change.
Currently changing captured values will cause a kernel re-compilation.
I'm working on improving this to avoid the re-compilation and simply
pass the new values to the kernel.

>>> 3. Why is the library not thread-safe by default? I'd say, we're long
>>> past single-threaded systems now, and having to always define the
>>> config macro is a nuisance.
>>
>> I would very much like to have it thread-safe by default. This is a
>> problem however with keeping the library header-only and useable with
>> C++03 compilers. The BOOST_COMPUTE_THREAD_SAFE macro basically just
>> instructs Boost.Compute to use the C++11 "thread_local" specifier for
>> global objects instead of "static". With C++03 compilers, this will
>> use boost::thread_specific_ptr<> which then requires users to also
>> link to Boost.Thread.
>>
>> That said, I still don't think it's ideal and I am very open to
>> ideas/patches which improve this.
>
> Personally, I see no big problem with dependency on Boost.Thread in
> C++03. However, it is quite possible to use system API to implement
> TLS in header-only library.
>
> On POSIX systems it is quite trivial with pthread_once and
> pthread_key* API. On Windows you can use Interlocked* functions or
> Boost.Atomic to implement something similar to pthread_once and Tls*
> functions for the TLS itself. The tricky part is the TLS cleanup,
> which can be done with help of the Windows thread pool. You can use
> RegisterWaitForSingleObject to schedule a wait operation on the handle
> of the thread that sets the thread-local value. When the thread exits,
> the pool will invoke the callback you passed to
> RegisterWaitForSingleObject, where you can clean the TLS value. The
> important difference from thread_local and Boost.Thread is that the
> callback is called in a thread different from the one that initialized
> the TLS value, but for various cleanup routines this should not
> matter.
>
> You can see how it's done in Boost.Sync:
>
> https://github.com/boostorg/sync/blob/develop/include/boost/sync/detail/waitable_timer.hpp

I personally don't see an issue with depending on Boost.Thread either
but this does prevent the library from being header-only. I'll take a
look at your example and see if that can be worked into Boost.Compute.
Thanks!

-kyle


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk