Boost logo

Boost :

Subject: Re: [boost] [compute] review
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2015-01-01 09:59:54


Le 30/12/14 20:08, Kyle Lutz a écrit :
> On Tue, Dec 30, 2014 at 7:15 AM, Vicente J. Botet Escriba
> <vicente.botet_at_[hidden]> wrote:
>> Hi,
>>
>> I have no time to do a complete review. Anyway here it is my minimal review
>>
>>
>> *1. What is your evaluation of the design?**
>> ***
>> I would prefer that the algorithms take as first parameter the queue (which
>> could be abstracted to an asynchronous executor).
> Any particular reason to prefer the first argument rather than the
> last? Currently, by using the last argument to specify the command
> queue, we can use a default argument of "system::default_queue()".
> For the common case of only wanting to use the default compute device
> on the system, all algorithms will work correctly and the user never
> has to bother with finding a device, setting up a context, creating a
> command queue and passing it to the algorithm function.
One of the reasons is to allow for algorithms that take a variadic
number of arguments. I don't know if you have one on your library, but I
find this a uniform pattern.
Second, I consider that the user must be explicit about which queue
(devices) is used as AFAIK this is an optimization library.
>
>> The performances let think that there is no interest in using the library
>> for containers having less than 10^4/10^5 elements.
> In general yes, GPUs/Accelerators are ill-suited for small problem
> sizes (unless the computation performed on each element is quite
> expensive). Sorting 1000 integers will nearly always be faster on the
> host rather than the device.
>
>> Even if this is considered a low level library, the documentation should
>> show how a library writer could make use of the provided interface to
>> provide a higher one.
> I agree, there should be more documentation for this. Could you let me
> know what sort of higher-level interface you'd like to see
> demonstrated?
The one of the standard Parallel TS :), or a pipeline framework.
>
>> I would prefer that the library make use of std::future or boost::future
>> class instead of adding an additional one.
>> Do the author tried to make use of the existing futures? if yes, which
>> problems were found?
> The implementation of the boost::compute::future type is fairly
> different than that of the std::/boost::future types. Essentially,
> there is no operation akin to "promise.set_value()" for
> boost::compute::future. Instead, it is a simple wrapper around an
> OpenCL event object which can be used to wait for an operation on the
> device to complete (via calling the blocking clWaitForEvents()
> function from the OpenCL API). Please let me know if there are ways to
> better integrate with boost::future.

I understand this seems to don't match with the std/future/promise
design. Note that the futures returned by std::async,
std::make_ready_future, future<>::then, when_all, when_any, don't show
neither an associated promise (as it is hidden).
I don't know yet how this could be done, but I find it an important use
case to be solved.

I'm not (by principle) against a multiplication of future types, but
this means that we would need to duplicate the whole future interface
(then continuations, when_all, when _any, ...).

>
>> I'm missing a range interface to most of the algorithms.
> The algorithms API is meant to mimic the STL algorithms API. Adding a
> range-based interface is definitely possible, but is a non-trivial
> amount of work. Would you be interested in contributing this?
Agreed this is a lot of work. No I have not time to contribute to this.
I was just wondering if an range interface, couldn't make it possible to
compose the programs executed by the whole pipeline and make the cost of
copies less significant.
>
>> I suspect that as this is a C++03 library it doesn't take advantages of some
>> interesting C++11+/C++14 features.
> It does take advantage of some C++11 features (e.g. variadic
> functions) where possible. I would be very open to integrating more
> C++11/14 features where appropriate.
>
>> It is not clear which operations are synchronous, which ones could block,
>> ...
> Could you be more specific about what operations you are referring to?
All the functions of the core classes must be explicit about
thread-safety (synchronization).
I suspect that almost all the algorithm block, and no parallelism
(pipeline) is possible. Could you confirm?
> I know there is already some documentation regarding this for the
> command_queue class as well as for the copy()/copy_async() functions.
> I'll work on adding more.
Great.
>
>> In order to get the native OpenCl handle I suggest to name the function
>> get_native_handle instead of get.
>>
>> *2. What is your evaluation of the implementation?*
>> I've not see at the implementation.
>>
>> *3. What is your evaluation of the documentation?*
>>
>> The documentation lacks of some core tutorials. I was unable to have an idea
>> of what a context is, what can be done with a command_queue, what is a
>> Kernel, a pipe, ...? Does it mean that the user must know OpenCL?
> The low-level API is a wrapper around OpenCL and so having some
> background knowledge there helps. I'll work on adding more
> documentation for these classes and how they fit together.
Thanks.
>
>> The reference documentation should follow he C++ standard way, including
>> pre-conditions, effects, synchronization, throws clauses.
> Fully agree, I'll continue to work on this.
Great.
>
>> I would appreciate a description of the algorithms used, how the parallelism
>> is used? is this completely managed by OpenCL or do the library something
>> else?
> I'm not sure if I fully understand what you're asking here but I'll
> try to answer it.
>
> OpenCL provides an API for executing code on compute devices (e.g.
> GPUs), Boost.Compute provides algorithms which, when called, use the
> OpenCL API to create programs, compile them for the device, and
> execute them.
>
> Let me know if I can better explain this.
I have take a look at the OpenCL documentation and I understand it better.
Could it be possible to have a roughly pseudo code of the program that
each one of the algorithms executes on the device?

>
>> *4. What is your evaluation of the potential usefulness of the library?**
>> ***
>> This is essentially an optimization library. It should be quite useful, but
>> the performances results don't probe it.
> Could you clarify what you mean by "the performances results don't
> probe it"? Boost.Compute provides a number of benchmark programs which
> can be used to measure and evaluate the performance of the library on
> a particular system. Results from this are also published on the
> "Performance" page [1] in the documentation.
What a meant is the performances are often worst except when the size of
the container is high enough. The scope of the library could be to be
better only for these big containers, but I'm missing an interface that
makes a choice between the Compute or the STL one.
>
>>
>> *8. Do you think the library should be accepted as a Boost library? **
>> *
>> As I have not taken too much time I'm unable to say that this library should
>> not be included in Boost. So I vote to include it conditionally, subject to:
>> * I would need some performances that probe the library is better than
>> using standard STL containers in most of the cases.
> Boost.Compute will never be universally better than the STL, and it
> was never intended to be. CPUs and GPUs make fundamentally different
> trade-offs which lead to very different performance characteristics.
> GPUs support performing computation on large amounts of data with very
> high throughput (in some cases 10x-100x faster than a CPU). The
> down-side to this is a larger overhead to get the GPU running as well
> as the cost of transferring data to device memory. Boost.Compute and
> the STL are different tools which each suit their own different
> use-cases. I don't think Boost.Compute should be required to supplant
> the STL in order to be an acceptable library.
I suggest you to make more evident the goal and scope of the library in
the introduction of the library. Please, replace my requirement by this
one.
>
>> * The documentation should be completed with more tutorials and a better
>> reference documentation.
> Fully agree, I'll definitely continue to work on improving the documentation.
>
> Thanks for the review. Let me know if I can explain anything more clearly.
>
>
I will comeback if I find some time to read the documentation carefully.

Good luck,
Vicente


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk