Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014
From: Asbjørn (lordcrc_at_[hidden])
Date: 2014-12-21 17:24:41

Next message: Sebastian Schaetz: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"
Previous message: Kyle Lutz: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"
In reply to: Kyle Lutz: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"
Next in thread: Sebastian Schaetz: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"
Reply: Sebastian Schaetz: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"
Reply: Kyle Lutz: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"
Reply: Thomas M: "Re: [boost] [compute] Review period starts today December 15, 2014, ends on December 24, 2014"

On 21.12.2014 20:39, Kyle Lutz wrote:
> On Sun, Dec 21, 2014 at 3:44 AM, Thomas M <firespot71_at_[hidden]> wrote:
>> Studying your library docs I find very
>> little information, what makes them different etc.; specifically nowhere
>> does it say that enqueue_read_buffer _is_ a blocking operation, it only says
>> that it _enqueues_ a read command. Both functions then simply refer to
>> clEnqueueReadBuffer() which does not help matters at all given the different
>> signature.
>
> Yes, I've split the blocking and non-blocking memory copy operations
> into separate functions. Personally, I've never been fond of APIs
> which drastically change behavior based on a single boolean flag.
> Also, this is more in line with the API provided by other libraries
> like Boost.ASIO (e.g. boost::asio::read() vs.
> boost::asio::async_read()).

As a library user I agree with the more explicit split between sync and async
routines, ala ASIO, and I think Boost.Compute should follow this convention.
However, I think this case you should deviate from the OpenCL API names to make
it more clear that things are different. Specifically, drop the "enqueue" word.
Simply have "read_buffer" and "read_buffer_async". For me the "enqueue" word
just makes things more confusing.

>> c) error handling: I'd much prefer some policy setting which specifies if an
>> exception is thrown on error (the usual custom in C++) or an error code is
>> returned by the function (the usual OpenCL behaviour).

FWIW, again as a library user, I quite like ASIO's approach where each operation
is overloaded to either fill an error_code or throw.

> This is also something I have played around with. Basically I'd like
> to have any API which allows users to define "pipelines" or
> "task-graphs" which hook up several different
> kernels/algorithms/memory-copies and produce an efficient set of
> operations to stream data through and extract the results.
>
> Any ideas you have on a potential API you'd like to see for this would
> be great. There is potentially some prior art in the C++ pipelines
> proposal [3] which may be interesting.

In my "just for fun" Delphi.Compute library (written in Delphi, inspired by
Boost.Compute) I made Copy() and Transform() return futures of the output
buffers, as well as accept futures as parameters. Note, Delphi doesn't have
iterators like C++ so my routines operate directly on buffers.

So when Transform() say got a Future<Buffer> instead of a Buffer for a
parameter, it would add the future's associated event to the wait list passed to
clEnqueueNDRangeKernel (technically the Buffer type has an implicit conversion
operator to an "immediate" Future<Buffer>).

This made it pretty seamless to queue up everything and then just wait for the
final read (the default "copy device buffer to host array and return it" call is
blocking). The code looks sequential but would only block on that last read.

I'm sure there are better ways, just thought I'd share.

> Strongly disagree, the floating-point operations on the device are
> well defined and their output should be identical to the host results
> (barring optimizations like "-cl-fast-relaxed-math").

While I agree, I've found Intel's OpenCL CPU device to return results which make
me think it uses some relaxed math regardless. With NVIDIA and AMD I can get
(essentially) the same results as reference CPU calculations, but with Intel I
sometimes get quite large discrepancies. Of course, it's possible I'm just doing
it wrong...

Cheers
- Asbjørn

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk