Subject: Re: [boost] [compute] Review
From: John Bytheway (jbytheway+boost_at_[hidden])
Date: 2014-12-30 21:23:18
On 2014-12-29 02:01, Kyle Lutz wrote:
> On Sun, Dec 28, 2014 at 7:26 PM, John Bytheway
> <jbytheway+boost_at_[hidden]> wrote:
>> It would be very nice to have range versions of all the algorithms. I
>> find myself using the boost::range algorithms much more than the std::
>> ones these days.
> I fully agree and also I'm very much looking forward to having range
> algorithms standardized in C++17.
Do you mean that you are planning to add them to Boost.Compute?
>> 3. What is your evaluation of the documentation?
>> Overall, not too bad.
>> More tutorial would be good. In particular, something using some of the
>> standard algorithms.
> I'll definitely work on this more. Any ideas of specific
> tutorials/examples you'd like to see would be very helpful.
Something using an algorithm which moves elements around might be nice.
Maybe sort or partition...
>> Some concerns:
>> - All the header links off
>> <http://kylelutz.github.io/compute/boost_compute/reference.html> are broken.
>> - All or most of the things in "See also" sections look like they ought
>> to be links but aren't.
>> - For all your function parameter documentation, the list of parameters
>> seems to be sorted alphabetically by parameter name rather than in the
>> order they are passed to the function. This is really weird...
> These are all, to the best of my knowledge, bugs in the Boost
> documentation toolchain. I reported these issues a while back on the
> boost-doc mailing list here  but never found a resolution. I would
> really like to get these to work, but am not knowledgable enough about
> the Boost documentation pipeline (specifically the doxygen->quickbook
> infrastructure) to fix it. If anyone has any ideas or could help with
> this I would be very appreciative.
Shame :(. Hope you get some responses here.
>> - <http://kylelutz.github.io/compute/boost/compute/copy.html> mentions
>> copying from e.g. a std::list to a compute::vector. It would be nice if
>> it gave some kind of indication as to how efficient we can expect such
>> an operation to be. In particular, I guess it ought to do some kind of
>> buffering on the host?
> Currently it will just buffer the entire range into a std::vector<T>
> and then copy that to the device. If this is a bottleneck it could
> also be improved to copy the data in smaller batches. Let me know if
> you encounter issues with this.
Does this also work with copy_async? I presume not, because there would
be troubles with the lifetime of the temporary vector...
>> - The way I normally copy into vectors is call v.reserve(...) and then
>> copy to std::back_inserter(v). I tried that with a compute::vector and
>> unsurprisingly it was very inefficient. I see you already warned about
>> this in the push_back docs, but perhaps it should also be highlighted in
>> the documentation of compute::copy.
> Yeah, this idiom is not well suited to copying into device memory. You
> should prefer to do "bulky" operations using a single call to
> boost::compute::copy() (which internally will call the OpenCL function
> for copying large chunks of memory).
>> - <http://kylelutz.github.io/compute/boost/compute/includes.html> it's
>> unclear whether the subrange has to be contiguous.
> It should have the same semantics of std::includes, which IIRC, does
> not require the subrange to be contiguous.
Then perhaps rephrasing would make it clearer. The wording on
<http://en.cppreference.com/w/cpp/algorithm/includes> seems clear.
>> - What source of randomness does random_shuffle use?
> Currently it simply calls std::random_shuffle to generate the shuffled
> index sequence then uses boost::compute::scatter() to actually
> re-arrange the values on the device. However I am planning on
> deprecating it and providing an implementation of the C++11 shuffle()
> algorithm instead which will use the random number generator provided
> by the user.
>> - Why does reduce return its result through an OutputIterator?
> In order to allow the algorithm to store the result in device memory
> and avoid a host-device synchronization point (which would be
> necessary if it simply returned the reduced result).
That makes sense, but seems inconsistent with accumulate, and to a
lesser extent the algorithms that return bool (binary_search, equal,
etc.) Are they all introducing a synchronization point? What about the
ones returning iterators?
>> - The boost/compute/types headers docs seem to be broken or missing. In
>> particular I couldn't find reference documentation for float4_ et al.
> Yeah, I wasn't sure the best way to document all of these (IIRC, 10
> different built in scalar types times four different vector version of
> each). I'll add some information with an overview of the provided
> fundamental types.
At least something documenting the supported means of initialization and
element access would be good :).
>> - <http://kylelutz.github.io/compute/boost/compute/function.html> I'm
>> puzzled by this interface. What's the name for? Would one ever use
>> this constructor rather than e.g. make_function_from_source?
> There are a few legitimate use-cases. One is for specifying the names
> of pre-existing or built-in functions (e.g.
> function<float(float)>("sqrt") would call the built-in sqrt()
So constructing a function by name points it at some existing named
function. That makes sense, but might be unexpected, so should probably
be mentioned in the constructor documentation.
>> - <http://kylelutz.github.io/compute/boost/compute/field.html> My
>> reading of this is that when you get the type wrong you are guaranteed
>> some kind of meaningful error report, and not just undefined behaviour;
>> is that right? Perhaps you should clarify.
Any comment here?
>> - The iterators should state what iterator category they have.
> Currently Boost.Compute doesn't really have different iterator
> categories, all iterators are random-access. I'll add this information
> to the documentation.
Now that you mention this I realize I was confused by what
function_input_iterator does. I had assumed it was an InputIterator
which supported stateful functions, like
But I guess in fact it has no guarantees about order of evaluation.
You might want to clarify this for folks accustomed to Boost.Iterator.
>> - What does default_device do if there are no devices? Do the various
>> environment variables mentioned have any kind of precedence? (Probably
>> linking to the corresponding OpenCL docs is sufficient for such
>> questions). Do they have to be exact matches? Are they case-sensitive?
> There is a precedence but it currently is not explained in the
> documentation. For most cases only one of the default device variables
> is meant to be used. They are currently will match substrings and are
> case-sensitive. I'll work on improving the documentation for these and
> give some example usages.
>> - What does find_device do if the requested device does not exist?
> Currently it will return a null (default-constructed) device object. I
> have considered updating this (and the default_device() function) to
> throw an exception when no valid device is found so that returned
> device objects are always valid.
I think that would be better. In particular, I experienced what happens
if the default device does not initialize properly (in my case because
of a missing kernel module) and that was an exception. Missing device
should probably be handled similarly.
>> - Congratulations for getting compile-time detection of internal padding
>> into BOOST_COMPUTE_ADAPT_STRUCT :). You might want to mention in the
>> docs that this will be detected, so people don't worry too much.
> Will do. I also hope in the future to transparently support padded
> structs as well and to be able to correctly copy them to/from the
> device without issue.
That would be nice.
>> - It looks like the performance tests don't include the time for copying
>> data to (and, where applicable, from) the device. You should mention
>> this on the Performance page.
> I'll add this, though it is very device/system specific. Also, there
> is a performance benchmark available in Boost.Compute
> ("perf_copy_to_device") which could be used to measure this on your
> own system.
I think I was not clear. I mean that, for example, the sort benchmark
doesn't include the time taken to copy data before and after the sort.
The text on the performance page should explain this, so people can
interpret the times properly.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk