Subject: Re: [boost] [compute] kernels as strings impairs readability and maintainability
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-23 21:57:32
On Tue, Dec 23, 2014 at 5:46 PM, Mathias Gaunard
> On 23/12/2014 20:21, Kyle Lutz wrote:
>> While yes, it does make developing Boost.Compute itself a bit more
>> complex, it also gives us much greater flexibility.
>> For instance, we can dynamically build programs at run-time by
>> combining algorithmic skeletons (such as reduce or scan) with custom
>> user-defined reduction functions and produce optimized kernels for the
>> actual platform that executes the code (which in fact can be
>> dramatically different hardware than where Boost.Compute itself was
>> compiled). It also allows us to automatically tune algorithm
>> parameters for the actual hardware present at run-time (and also
>> allows us to execute currently algorithms as efficiently as possible
>> on future hardware platforms by re-tuning and scaling up parameters,
>> all without any recompilation). It also allows us to generate fully
>> specialized kernels at run-time based on
>> dynamic-input/user-configuration (imagine user-created filter
>> pipelines in Photoshop or custom database queries in PGSQL).
>> I think this added complexity is well worth the cost and this fits
>> naturally with OpenCL's JIT-like programming model.
> I could see that from the code, yes.
> But nothing should prevent doing that while still writing the original
> OpenCL source code (or skeletons/templates) in separate files rather than C
>>> Has separate compilation been considered?
>>> Put the OpenCL code into .cl files, and let the build system do whatever
>>> needed to transform them into a form that can be executed.
>> Compiling programs to binaries and then later loading them from disk
>> is supported by Boost.Compute (and is in fact used to implement the
>> offline kernel caching infrastructure). However, for the reasons I
>> mentioned before, this mode is not used exclusively in Boost.Compute
>> and the algorithms are mainly implemented in terms of the run-time
>> program creation and compilation model.
> I didn't necessarily mean compiling OpenCL to SPIR (if that's indeed what
> you mean by binary).
Well, to be clear, OpenCL provides two mechanisms for creating
programs, one from source strings with clCreateProgramWithSource() and
one from binary blobs with clCreateProgramWithBinary(). Binaries are
either in a vendor-specific format, or in the SPIR form for platforms
that support it (which essentially attempts to be a "vendor-neutral"
> You could just make the build system automatically generate the C string
> from a .cl file, for example.
Boost.Compute has no "build system", it is merely a set of header
files. If, in the future, we move away from a header-only
implementation, we could certainly do something like this.
>> Another concern is that Boost.Compute is a header-only library and
>> doesn't control the build system or how it the library will be loaded.
>> This limits our ability to pre-compile certain programs and "install"
>> them for later use by the library.
> As it is, you're probably getting some bloat for the sole reason that you're
> getting a copy of all your strings in every TU, in particular the radix sort
> It makes more sense for it to be a library IMHO.
> There is a tendency for people to prefer header-only designs because it
> facilitates deployment due to not having to build a library with compatible
> settings separately, but I do not think someone should go for header-only
> just for that reason.
These are good points. In the future we may move away from a
header-only implementation if it proves to be too big of a hinderance.
>> That said, I am very interested in exploring methods for integrating
>> OpenCL source files built by the build tool-chain and make loading and
>> executing them seamless with the rest of Boost.Compute. One approach I
>> have for this is an "extern_function<>" class which works like
>> "boost::compute::function<>", but instead of being specified with a
>> string at run-time, its object code is loaded from a pre-compiled
>> OpenCL binary on disk. I've also been exploring a clang-plugin-based
>> approach to simplify embedding OpenCL code in C++ and using it
>> together with the Boost.Compute algorithms.
> I do not know what you have in mind with your clang development, but I
> assumed your library was sticking to oldish standard OpenCL for
> compatibility with a wide variety of devices and older toolchains.
> There are already some compiler projects that can generate hybrid CPU and
> GPU code from a single source, turning functions into GPU kernels as needed:
> C++AMP does it, CUDA does it too somewhat, and now there is SYCL, a recent
> addition to the OpenCL standards that was presented at SC14, which should
> become the best solution for this.
Yes, I am well aware of these projects. However, one of my goals for
Boost.Compute was to provide a GPGPU library which required no special
compiler or compiler extensions (as CUDA, C++AMP, SYCL, OpenACC,
etc... all do). My aim is to provide a portable parallel programming
library in C++ which supports the widest range of available platforms
and compilers and I feel OpenCL fills this role very well (also see
the "Why OpenCL?" section  in the documentation for more on my
rationale for this choice).