Boost logo

Boost :

Subject: Re: [boost] [compute] Review
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-31 14:09:30


On Wed, Dec 31, 2014 at 9:57 AM, Ioannis Papadopoulos
<ipapadop_at_[hidden]> wrote:
> On 12/30/2014 11:57 PM, Kyle Lutz wrote:
>> On Tue, Dec 30, 2014 at 8:14 PM, Yiannis Papadopoulos
>> <ipapadop_at_[hidden]> wrote:
>>> Hi,
>>>
>>> This is my review of Boost.Compute:
>>>
>>> 2. What is your evaluation of the implementation?
>>>
>>> There is some code duplication (e.g. type traits) and various other bits and
>>> pieces that can be moved to existing Boost components. I think there should
>>> be some effort spent towards that.
>>
>> Could you let me know which type-traits you think are duplicated or
>> should be moved elsewhere?
>
> For example, the is_fundamental<T> is already implemented in
> Boost.TypeTraits. Or type_traits/type_name.hpp may be able to leverage
> Boost.TypeIndex?

True, there is a boost::is_fundamental<T> (and a
std::is_fundamental<T> in C++11), but these have different semantics
than boost::compute::is_fundamental<T>. For Boost.Compute, the
is_fundamental<T> trait returns true if the type T is fundamental on
the device (i.e. a OpenCL built-in type). For example, the float4_
type is an aggregate type on the host (i.e. std:: is_fundamental
<float4_>::value == false) but is a built-in type in OpenCL (i.e.
boost::compute::is_fundamental<float4_>::value == true).

As for type_name<T>(), it returns a string with the OpenCL type name
for the C++ type and can actually be very different from the C++ type
name (e.g. type_name<Eigen::Vector2f>() == "float2").

>>> 8. Do you think the library should be accepted as a Boost library?
>>>
>>> This will be a maybe. It is a well-written library with a few minor issues
>>> that can be resolved.
>>>
>>> However, why would someone use Boost.Compute against what is out there?
>>> Average users can resort to Bolt or Thrust. Power users will probably always
>>> try to hand-tune their OpenCL or CUDA algorithm. How can we test it and
>>> prove its performance?
>>
>> Yes, Thrust and Bolt are alternatives. The problem is that each is
>> incompatible with the other. Thrust works on NVIDIA GPUs while Bolt
>> only works on AMD GPUs. Choosing one will preclude your code from
>> working on devices from the other.
>>
>> On the other hand, code written with Boost.Compute will work on any
>> device with an OpenCL implementation. This includes NVIDIA GPUs, AMD
>> GPUs/CPUs, Intel GPUs/CPUs as well as other more exotic architectures
>> (Xeon Phi, FPGAs, Parallella Epiphany, etc.). Furthermore, unlike
>> CUDA/Thrust, Boost.Compute requires no special complier or
>> compiler-extensions in order to execute code on GPUs, it is a pure
>> library-level solution which is compatible with any standard C++
>> compiler.
>>
>> Also, Boost.Compute does allow for users to access the low-level APIs
>> and execute their own hand-rolled kernels (and even interleave their
>> custom operations with the high-level algorithms available in
>> Boost.Compute). I think using Boost.Compute in this way allows for
>> both rapid development and the ability to fully-optimize kernels for
>> specific operations where necessary.
>>
>> Thanks for the review. Let me know if I can explain anything more clearly.
>>
>> -kyle
>>
>> [1] https://github.com/kylelutz/compute/tree/master/perf
>
>
> I realize that, but the thing is that what is the advantage of
> Boost.Compute vs doing something like:
>
> template<class InputIterator , class EqualityComparable >
> auto count(InputIterator first, InputIterator last, const
> EqualityComparable& value)
> {
> #ifdef THRUST
> return thrust::count(first, last, value);
> #elif BOLT
> return bolt::cl::count(first, last, value);
> #elif STL
> return std::count(first, last, value);
> #endif
> }
>
> where first and last are iterators on some vector<> that is ifdefed
> similarly (or just use some template magic to invoke the right algorithm
> based on the container type). I have this concern, and IMO users might
> question themselves that while shopping for GPU libraries.

Well if we took this approach, the library would have to be compiled
separately for each different compute device rather than being
portable to any system with an OpenCL implementation. And while this
is a trivial example for count(), implementing this for more
complicated algorithms which take user-defined operators or work with
higher-level iterators (e.g. transform_iterator or zip_iterator) would
be much more difficult. I think an approach like this would ultimately
be more complex, harder to maintain, and less flexible in the
interfaces/functionality we could offer.

> Just to be clear, I am not dissing your work: I really like it and your
> positive attitude for addressing issues.

Not at all, I appreciate your feedback. Thanks!

-kyle


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk