Boost logo

Boost :

Subject: Re: [boost] [compute] kernels as strings impairs readability and maintainability
From: Pavan Yalamanchili (pavan_at_[hidden])
Date: 2014-12-24 14:15:10

I am a bit late to the party, but we faced the same problem with our
library, ArrayFire.

The solution we came up with is the following.

- The kernels are written as .cl files and are part of the repository.
- During the build process, the kernels in .cl files are converted to
strings in *new* .hpp files.
- The auto-generated kernel headers are the files that are included when
trying to compile the said kernel.

This allowed us to iterate quickly when writing OpenCL code. This could
work with Boost.Compute and also keep it as a header only library.
The only downside is that users will not be able to point to the source
repo directly. They will have to do a "make install" which converts kernels
in .cl to strings in .hpp.

On Tue, Dec 23, 2014 at 9:57 PM, Kyle Lutz <kyle.r.lutz_at_[hidden]> wrote:

> On Tue, Dec 23, 2014 at 5:46 PM, Mathias Gaunard
> <mathias.gaunard_at_[hidden]> wrote:
> > On 23/12/2014 20:21, Kyle Lutz wrote:
> >
> >> While yes, it does make developing Boost.Compute itself a bit more
> >> complex, it also gives us much greater flexibility.
> >>
> >> For instance, we can dynamically build programs at run-time by
> >> combining algorithmic skeletons (such as reduce or scan) with custom
> >> user-defined reduction functions and produce optimized kernels for the
> >> actual platform that executes the code (which in fact can be
> >> dramatically different hardware than where Boost.Compute itself was
> >> compiled). It also allows us to automatically tune algorithm
> >> parameters for the actual hardware present at run-time (and also
> >> allows us to execute currently algorithms as efficiently as possible
> >> on future hardware platforms by re-tuning and scaling up parameters,
> >> all without any recompilation). It also allows us to generate fully
> >> specialized kernels at run-time based on
> >> dynamic-input/user-configuration (imagine user-created filter
> >> pipelines in Photoshop or custom database queries in PGSQL).
> >>
> >> I think this added complexity is well worth the cost and this fits
> >> naturally with OpenCL's JIT-like programming model.
> >
> >
> > I could see that from the code, yes.
> > But nothing should prevent doing that while still writing the original
> > OpenCL source code (or skeletons/templates) in separate files rather
> than C
> > strings.
> >
> >
> >>> Has separate compilation been considered?
> >>> Put the OpenCL code into .cl files, and let the build system do
> whatever
> >>> is
> >>> needed to transform them into a form that can be executed.
> >>
> >>
> >> Compiling programs to binaries and then later loading them from disk
> >> is supported by Boost.Compute (and is in fact used to implement the
> >> offline kernel caching infrastructure). However, for the reasons I
> >> mentioned before, this mode is not used exclusively in Boost.Compute
> >> and the algorithms are mainly implemented in terms of the run-time
> >> program creation and compilation model.
> >
> >
> > I didn't necessarily mean compiling OpenCL to SPIR (if that's indeed what
> > you mean by binary).
> Well, to be clear, OpenCL provides two mechanisms for creating
> programs, one from source strings with clCreateProgramWithSource() and
> one from binary blobs with clCreateProgramWithBinary(). Binaries are
> either in a vendor-specific format, or in the SPIR form for platforms
> that support it (which essentially attempts to be a "vendor-neutral"
> binary representation).
> > You could just make the build system automatically generate the C string
> > from a .cl file, for example.
> Boost.Compute has no "build system", it is merely a set of header
> files. If, in the future, we move away from a header-only
> implementation, we could certainly do something like this.
> >> Another concern is that Boost.Compute is a header-only library and
> >> doesn't control the build system or how it the library will be loaded.
> >> This limits our ability to pre-compile certain programs and "install"
> >> them for later use by the library.
> >
> >
> > As it is, you're probably getting some bloat for the sole reason that
> you're
> > getting a copy of all your strings in every TU, in particular the radix
> sort
> > kernel.
> > It makes more sense for it to be a library IMHO.
> >
> > There is a tendency for people to prefer header-only designs because it
> > facilitates deployment due to not having to build a library with
> compatible
> > settings separately, but I do not think someone should go for header-only
> > just for that reason.
> These are good points. In the future we may move away from a
> header-only implementation if it proves to be too big of a hinderance.
> >> That said, I am very interested in exploring methods for integrating
> >> OpenCL source files built by the build tool-chain and make loading and
> >> executing them seamless with the rest of Boost.Compute. One approach I
> >> have for this is an "extern_function<>" class which works like
> >> "boost::compute::function<>", but instead of being specified with a
> >> string at run-time, its object code is loaded from a pre-compiled
> >> OpenCL binary on disk. I've also been exploring a clang-plugin-based
> >> approach to simplify embedding OpenCL code in C++ and using it
> >> together with the Boost.Compute algorithms.
> >
> >
> > I do not know what you have in mind with your clang development, but I
> > assumed your library was sticking to oldish standard OpenCL for
> > compatibility with a wide variety of devices and older toolchains.
> >
> > There are already some compiler projects that can generate hybrid CPU and
> > GPU code from a single source, turning functions into GPU kernels as
> needed:
> > C++AMP does it, CUDA does it too somewhat, and now there is SYCL, a
> recent
> > addition to the OpenCL standards that was presented at SC14, which should
> > become the best solution for this.
> Yes, I am well aware of these projects. However, one of my goals for
> Boost.Compute was to provide a GPGPU library which required no special
> compiler or compiler extensions (as CUDA, C++AMP, SYCL, OpenACC,
> etc... all do). My aim is to provide a portable parallel programming
> library in C++ which supports the widest range of available platforms
> and compilers and I feel OpenCL fills this role very well (also see
> the "Why OpenCL?" section [1] in the documentation for more on my
> rationale for this choice).
> -kyle
> [1]
> _______________________________________________
> Unsubscribe & other changes:

Boost list run by bdawes at, gregod at, cpdaniel at, john at