Boost logo

Boost :

Subject: Re: [boost] [compute] kernels as strings impairs readability and maintainability
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2014-12-23 20:46:00


On 23/12/2014 20:21, Kyle Lutz wrote:

> While yes, it does make developing Boost.Compute itself a bit more
> complex, it also gives us much greater flexibility.
>
> For instance, we can dynamically build programs at run-time by
> combining algorithmic skeletons (such as reduce or scan) with custom
> user-defined reduction functions and produce optimized kernels for the
> actual platform that executes the code (which in fact can be
> dramatically different hardware than where Boost.Compute itself was
> compiled). It also allows us to automatically tune algorithm
> parameters for the actual hardware present at run-time (and also
> allows us to execute currently algorithms as efficiently as possible
> on future hardware platforms by re-tuning and scaling up parameters,
> all without any recompilation). It also allows us to generate fully
> specialized kernels at run-time based on
> dynamic-input/user-configuration (imagine user-created filter
> pipelines in Photoshop or custom database queries in PGSQL).
>
> I think this added complexity is well worth the cost and this fits
> naturally with OpenCL's JIT-like programming model.

I could see that from the code, yes.
But nothing should prevent doing that while still writing the original
OpenCL source code (or skeletons/templates) in separate files rather
than C strings.

>> Has separate compilation been considered?
>> Put the OpenCL code into .cl files, and let the build system do whatever is
>> needed to transform them into a form that can be executed.
>
> Compiling programs to binaries and then later loading them from disk
> is supported by Boost.Compute (and is in fact used to implement the
> offline kernel caching infrastructure). However, for the reasons I
> mentioned before, this mode is not used exclusively in Boost.Compute
> and the algorithms are mainly implemented in terms of the run-time
> program creation and compilation model.

I didn't necessarily mean compiling OpenCL to SPIR (if that's indeed
what you mean by binary).

You could just make the build system automatically generate the C string
from a .cl file, for example.

> Another concern is that Boost.Compute is a header-only library and
> doesn't control the build system or how it the library will be loaded.
> This limits our ability to pre-compile certain programs and "install"
> them for later use by the library.

As it is, you're probably getting some bloat for the sole reason that
you're getting a copy of all your strings in every TU, in particular the
radix sort kernel.
It makes more sense for it to be a library IMHO.

There is a tendency for people to prefer header-only designs because it
facilitates deployment due to not having to build a library with
compatible settings separately, but I do not think someone should go for
header-only just for that reason.

> That said, I am very interested in exploring methods for integrating
> OpenCL source files built by the build tool-chain and make loading and
> executing them seamless with the rest of Boost.Compute. One approach I
> have for this is an "extern_function<>" class which works like
> "boost::compute::function<>", but instead of being specified with a
> string at run-time, its object code is loaded from a pre-compiled
> OpenCL binary on disk. I've also been exploring a clang-plugin-based
> approach to simplify embedding OpenCL code in C++ and using it
> together with the Boost.Compute algorithms.

I do not know what you have in mind with your clang development, but I
assumed your library was sticking to oldish standard OpenCL for
compatibility with a wide variety of devices and older toolchains.

There are already some compiler projects that can generate hybrid CPU
and GPU code from a single source, turning functions into GPU kernels as
needed: C++AMP does it, CUDA does it too somewhat, and now there is
SYCL, a recent addition to the OpenCL standards that was presented at
SC14, which should become the best solution for this.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk