Subject: Re: [boost] Synchronization (RE: [compute] review)
From: Kyle Lutz (kyle.r.lutz_at_[hidden])
Date: 2014-12-28 20:35:37
On Sun, Dec 28, 2014 at 4:46 PM, Gruenke,Matt <mgruenke_at_[hidden]> wrote:
> -----Original Message-----
> From: Boost [mailto:boost-bounces_at_[hidden]] On Behalf Of Kyle Lutz
> Sent: Sunday, December 28, 2014 14:42
> To: boost_at_[hidden] List
> Subject: Re: [boost] [compute] review
>>On Sun, Dec 28, 2014 at 1:54 AM, Gruenke,Matt wrote:
>>> I agree with other comments made about synchronization. The design should
>>> be more explicit about what's asynchronous,
>> Like I mentioned before, there is only one method for asynchrony in Boost.Compute,
>> the command queue abstraction provided by OpenCL. Operations are enqueued to be
>> executed on the compute device and this occurs asynchronously with respect to code
>> executing on the host. The exception to this rule are functions which interact
>> directly with host-memory which by default are blocking and offer explicit
>> "_async()" versions (in order to prevent potential race-conditions on host-memory).
>>> Regarding synchronization, I'm a also bit concerned about the performance impact
>>> of synchronizing on all copies to host memory. Overuse of synchronization can
>>> easily result in performance deterioration. On this point, I think it might be
>>> worth limiting host memory usable with algorithms, to containers that perform
>>> implicit synchronization to actual use (or destruction) of results. Give users
>>> the choice between that or performing explicit copies to raw types.
>> To be clear, all copies are not synchronized with host memory.
>> Boost.Compute allows both synchronous and asynchronous memory transfers between the
>> host and device.
> My understanding, based on comments you've made to other reviewers, is that functions like boost::compute::transform() are asynchronous when the result is on the device, but block when the result is on the host. This is what I'm concerned about. Is it true?
Yes this is correct. In general, algorithms like transform() are
asynchronous when the input/output ranges are both on the device and
synchronous when one of the ranges is on the host. I'll work on better
ways to allow asynchrony in the latter case. One of my current ideas
is add asynchronous memory-mapping support to the mapped_view class
 which can then be used with any of the algorithms in an
>>> Also, I agree with Thomas M that it'd be useful for operations to return events.
>> All asynchronous operations in the command queue class do return events. One of his
>> comments was to also return events from the synchronous methods for consistency and
>> I am working on adding this.
> Well, what I had in mind was events for higher-order operations, like boost::compute::transform().
Yes, I would also like to have higher-level support for chaining
together algorithms asynchronously. However, designing a generic and
generally useful API for this is a complex task and may take some time
and (I've shied away from just adding an extra "_async()" function for
all of the algorithm APIs as I think it could be done
better/more-extensibly). Any ideas/proposals for this would be great
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk