Boost logo

Boost :

Subject: Re: [boost] [compute] review
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-30 14:08:30


On Tue, Dec 30, 2014 at 7:15 AM, Vicente J. Botet Escriba
<vicente.botet_at_[hidden]> wrote:
> Hi,
>
> I have no time to do a complete review. Anyway here it is my minimal review
>
>
> *1. What is your evaluation of the design?**
> ***
> I would prefer that the algorithms take as first parameter the queue (which
> could be abstracted to an asynchronous executor).

Any particular reason to prefer the first argument rather than the
last? Currently, by using the last argument to specify the command
queue, we can use a default argument of "system::default_queue()".
For the common case of only wanting to use the default compute device
on the system, all algorithms will work correctly and the user never
has to bother with finding a device, setting up a context, creating a
command queue and passing it to the algorithm function.

> The performances let think that there is no interest in using the library
> for containers having less than 10^4/10^5 elements.

In general yes, GPUs/Accelerators are ill-suited for small problem
sizes (unless the computation performed on each element is quite
expensive). Sorting 1000 integers will nearly always be faster on the
host rather than the device.

> Even if this is considered a low level library, the documentation should
> show how a library writer could make use of the provided interface to
> provide a higher one.

I agree, there should be more documentation for this. Could you let me
know what sort of higher-level interface you'd like to see
demonstrated?

> I would prefer that the library make use of std::future or boost::future
> class instead of adding an additional one.
> Do the author tried to make use of the existing futures? if yes, which
> problems were found?

The implementation of the boost::compute::future type is fairly
different than that of the std::/boost::future types. Essentially,
there is no operation akin to "promise.set_value()" for
boost::compute::future. Instead, it is a simple wrapper around an
OpenCL event object which can be used to wait for an operation on the
device to complete (via calling the blocking clWaitForEvents()
function from the OpenCL API). Please let me know if there are ways to
better integrate with boost::future.

> I'm missing a range interface to most of the algorithms.

The algorithms API is meant to mimic the STL algorithms API. Adding a
range-based interface is definitely possible, but is a non-trivial
amount of work. Would you be interested in contributing this?

> I suspect that as this is a C++03 library it doesn't take advantages of some
> interesting C++11+/C++14 features.

It does take advantage of some C++11 features (e.g. variadic
functions) where possible. I would be very open to integrating more
C++11/14 features where appropriate.

> It is not clear which operations are synchronous, which ones could block,
> ...

Could you be more specific about what operations you are referring to?
I know there is already some documentation regarding this for the
command_queue class as well as for the copy()/copy_async() functions.
I'll work on adding more.

> In order to get the native OpenCl handle I suggest to name the function
> get_native_handle instead of get.
>
> *2. What is your evaluation of the implementation?*
> I've not see at the implementation.
>
> *3. What is your evaluation of the documentation?*
>
> The documentation lacks of some core tutorials. I was unable to have an idea
> of what a context is, what can be done with a command_queue, what is a
> Kernel, a pipe, ...? Does it mean that the user must know OpenCL?

The low-level API is a wrapper around OpenCL and so having some
background knowledge there helps. I'll work on adding more
documentation for these classes and how they fit together.

> The reference documentation should follow he C++ standard way, including
> pre-conditions, effects, synchronization, throws clauses.

Fully agree, I'll continue to work on this.

> I would appreciate a description of the algorithms used, how the parallelism
> is used? is this completely managed by OpenCL or do the library something
> else?

I'm not sure if I fully understand what you're asking here but I'll
try to answer it.

OpenCL provides an API for executing code on compute devices (e.g.
GPUs), Boost.Compute provides algorithms which, when called, use the
OpenCL API to create programs, compile them for the device, and
execute them.

Let me know if I can better explain this.

> *4. What is your evaluation of the potential usefulness of the library?**
> ***
> This is essentially an optimization library. It should be quite useful, but
> the performances results don't probe it.

Could you clarify what you mean by "the performances results don't
probe it"? Boost.Compute provides a number of benchmark programs which
can be used to measure and evaluate the performance of the library on
a particular system. Results from this are also published on the
"Performance" page [1] in the documentation.

> *5. Did you try to use the library? With what compiler? Did you have any
> problems?**
> *
> No.*
> ***
> *6. How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?**
> ***
> A glance.
>
> *7. Are you knowledgeable about the problem domain?*
>
> Yes and not. I don't know about OpenCL, but I'm aware of the current
> parallel proposals.
>
> *8. Do you think the library should be accepted as a Boost library? **
> *
> As I have not taken too much time I'm unable to say that this library should
> not be included in Boost. So I vote to include it conditionally, subject to:
> * I would need some performances that probe the library is better than
> using standard STL containers in most of the cases.

Boost.Compute will never be universally better than the STL, and it
was never intended to be. CPUs and GPUs make fundamentally different
trade-offs which lead to very different performance characteristics.
GPUs support performing computation on large amounts of data with very
high throughput (in some cases 10x-100x faster than a CPU). The
down-side to this is a larger overhead to get the GPU running as well
as the cost of transferring data to device memory. Boost.Compute and
the STL are different tools which each suit their own different
use-cases. I don't think Boost.Compute should be required to supplant
the STL in order to be an acceptable library.

> * The documentation should be completed with more tutorials and a better
> reference documentation.

Fully agree, I'll definitely continue to work on improving the documentation.

Thanks for the review. Let me know if I can explain anything more clearly.

-kyle

[1] http://kylelutz.github.io/compute/boost_compute/performance.html


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk