Boost logo

Boost :

Subject: Re: [boost] Boost SIMD beta release
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-12-23 21:54:01


on Sun Dec 23 2012, Mathias Gaunard <mathias.gaunard-AT-ens-lyon.org> wrote:

> On 21/12/12 19:48, Dave Abrahams wrote:
>
>> Don't worry, I hear you loud and clear saying that it doesn't work. You
>> haven't, though, done anything to help me understand why you're saying
>> that.
>
> I have done so repeatedly (this is not the first time we're discussing
> this), yet you don't seem to understand what we're actually doing and
> what is needed.

Perhaps not.

> There are both semantical and technical problems with trying to use
> segmented iterators, both of which I already explained to you.
>
> Let me repeat everything I've already told you in one single
> message. This will be my last attempt to explain it to you.
> It is already quite ironic that I spent more time explaining to you
> why segmented iterators do not bring anything to help with this
> problem than the time it took me to figure it out.

You seem to think that's because I'm being purposefully obtuse. I'm
really just trying to learn something here and explore this design
space. No matter how frustrating it may be to explain these things to
me, I don't think I deserve this level of hostility.

> SIMD are special instructions that allow to perform the same operation
> on a vector of N values.

Yes.

> For simplicity's sake, we'll ignore alignment concerns in what
> follows, since it's only something to need to take care of with if you
> want to do additional optimizations.

Oh, I was under the impression that in general you could only use SIMD
instructions on aligned sequences. That actually seems to make
segmented iterators more practical, because you don't (necessarily) need
to solve the "misaligned segments" problem.

> If you want to perform a function like plus, on n elements, and that n
> is not necessarily a multiple of N, you can do a certain number of
> operations with vectors of N elements and then you need to take care
> of the trailing data that doesn't fully fill a vector differently.
> If you want to add two sequences of 10 floats each, storing the result
> in another third sequence, and you have 4xfloat vectors, you do 4
> 4xfloat loads, 2 4xfloat additions, 2 4xfloat stores and then 4 normal
> 1xfloat loads, 2 1xfloat additions and 2 1xfloat stores (which are all
> different functions than their 4xfloat counterparts).

That is already consistent with what I already understand about SIMD.

>
> Question: would iterating the sequence with a segmented iterator allow
> us to do this?

It would seem so.

> Semantical problem: an iterator, be it segmented or otherwise, must
> provide the same element type for all its elements (i.e. sequences are
> homogeneous).

Well, that is one view. Segmented iterators, conceptually, provide a
"flat" view of the elements that works with standard algorithms, and a
"segmented" view that is useful for optimization.

I think I now see the problem to which you're referring. The
hierarchical algorithms in exactly the form presented by Austern would
certainly not work, because ultimately they present no type information
to nested algorithm calls that would allow compile-time dispatching to a
different implementation (one that used SIMD instructions). However, I
think something *very* similar would work perfectly well.

If you look at the Austern paper's "fill" example (and imagine there's a
SIMD "fill" instruction), the key lines are marked below with "****".

#+begin_src c++
  template <class SegIter, class T>
  void fill(SegIter first, SegIter last, const T& x, true_type)
  {
    typedef segmented_iterator_traits<SegIter> traits;
    typename traits::segment_iterator sfirst
      = traits::segment(first);
    typename traits::segment_iterator slast
      = traits::segment(last);
    if (sfirst == slast)
      fill(traits::local(first), traits::local(last), x);
    else {
      fill(traits::local(first), traits::end(sfirst), x);
      for (++sfirst ; sfirst != slast ; ++sfirst) // ****
        fill(traits::begin(sfirst), traits::end(sfirst), x); // ****
      fill(traits::begin(sfirst), traits::local(last), x);
    }
  }

  template <class ForwardIter, class T>
  void fill(ForwardIter first, ForwardIter last, const T& x, false_type)
  {
    for ( ; first != last; ++first)
       *first = x;
  }

  template <class Iter, class T>
  inline void fill(Iter first, Iter last, const T& x)
  {
    typedef segmented_iterator_traits<Iter> Traits;
    fill(first, last, x, typename Traits::is_segmented_iterator());
  }
#+end_src

A simple change to that loop could allow a SIMD-specialized version of
fill to kick in. For example, s/begin/seg_begin/ and s/end/seg_end/.
OK, the names are not great, but the idea is that seg_begin would return
a begin iterator that carries in its type the fact that it is on a
segment boundary, and also, when the segments are contiguous and of SIMD
vector size, that additional information. Then one could add an overload
of fill that takes advantage of this information to use SIMD fill
instructions rather than regular assignments.

> That means we cannot have a 4xfloat body and a trailing 1xfloat
> epilogue. We need to have 4xfloat everywhere, which, while possible in
> some cases, would be limiting in others, needing a case-by-case
> workaround depending on the operation being done (like conditional
> stores or filling with neutral data).

I'm not sure what kinds of workarounds you have in mind. An example
might help.

> Technical problem: for each segment, the type of the iterator needs to
> be the same.

Only if you take the Austern paper *very* literally.

> The code for incrementing or dereferencing the iterator is therefore
> the same.
> This gives us two solutions:
> - Do a branch in the dereferencing function of the iterator to load
> normal data and trailing data differently (at every iteration, not ok
> --
> this approach also works with normal iterators)
> - Pre-emptively build the last vector of the data and copy it into
> the iterator (which implies unnecessary loads, stores and cache misses
> and makes the iterator expensive to copy), and consider that as an
> additional segment of contiguous memory.

Eeew. I would never suggest that.

> Answer: Not in a very satisfying manner. The added generality of those
> concepts does not justify the runtime overhead and the fact that each
> user would still need to deal with how to reduce trailing data to a
> vector.
>
>> I don't know what you mean.
>
> Since you don't know, why do you insist on saying that it is "obvious"?

I feel my words are being twisted here. The only time I used that word
in this thread was in

  "It just seems like an obvious fit to me, but I'm hardly a SIMD
  expert."

This was nothing more than an explanation of why I was pursuing this
line of discussion. I certainly was not saying that your meaning was
"obvious," as you seem to be implying.

>>> Polymorphic algorithms ended up being a significantly more interesting
>>> approach.
>>
>> Nor here.
>
> In this instance, polymorphic algorithms refer to algorithms that call
> a polymorphic function object with different argument types depending
> on which part of the sequence is being processed.

I think that is exactly what I am proposing.

>> I don't have time to write the code out ATM. It seems to me that once
>> you impose a hierarchical "SIMD" view over the sequence, you can
>> specialize the algorithm for the aligned part, to use SIMD instructions.
>> Maybe I'm overlooking something, but so far you haven't been very
>> helpful in the understanding-what-I'm-missing department.
>
> SIMD algorithms and segmented iterator algorithms were done for
> different purposes.
> Lifting them to the same general algorithm is just a bad idea.

Maybe. I'm still not convinced. It seems like a way to write generic
algorithms that allows significant incremental optimization and code
reuse.

-- 
Dave Abrahams
BoostPro Computing                  Software Development        Training
http://www.boostpro.com             Clang/LLVM/EDG Compilers  C++  Boost

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk