Boost logo

Boost :

Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: David Abrahams (dave_at_[hidden])
Date: 2010-10-18 20:15:24


At Mon, 18 Oct 2010 22:43:25 +0200,
Sebastian Redl wrote:
>
>
> On 18.10.2010, at 16:34, David Abrahams wrote:
>
> >
> > No, you need different code for iterating the end bits. Look
> > carefully at the hierarchical_fill implementation near the end of the
> > paper.
>
> Segmented iterators as described in the paper have two limitations, as I see it:
> 1) The value type of the flat iterator and the local iterator have
> to be the same.

Which is totally fine in this case

> 2) Segments must be uniform.
> (2) means that you can't have one segment have one type with one
> kind of iterator, and the next another type with another kind of
> iterator. Case in point, joint_iterator over a fusion sequence of
> containers.

Also totally fine in this case. If you want to operate on a
heterogeneous sequence, you'd need fusion-y segmented iterators of
course, so you'd make the obvious corresponding extensions of fusion
iterator concepts. But you don't need it for SIMD, AFAICT

> You can't represent those with a segmented iterator, because the
> segments are different container types and thus you cannot have a
> segment iterator, since it wouldn't have a single value type. It
> occurs to me that there probably would be a way to bring
> compile-time fixed-size heterogenous iterators (well, fusion
> sequences) into a reusable concept world here too,

They're already in that world, IIUC.

> but that's a lot of additional work on top of segmented iterators.
>
> (2) also means that the local iterators for all segments are the
> same, and therefore that the final value type of every segment has
> to be the same.

Also not a problem.

> (1) is not immediately obvious. The segmented_iterator_traits do not
> actually make this requirement. However, it is a consequence of the
> intended use of segmented iterators: transparent efficiency increase
> for the standard algorithms. fill() is a case in point. The flat
> version looks like this:
>
> template <typename ForwardIterator, typename T>
> void fill(ForwardIterator first, ForwardIterator last, const T &v) {
> for (; first != last; ++first) {
> *first = v;
> }
> }
>
> Requirements: ForwardIterator is a forward iterator, and T is
> assignable to ForwardIterator::reference.
>
> If ForwardIterator was additionally a segmented iterator, the
> complicated hierarchical fill should be transparently invoked. This
> comes down to this (implementation switching omitted):
>
> template <typename SegIterator, typename T>
> void fill(SegIterator first, SegIterator last, const T &v) {
> typedef segmented_iterator_traits<SegIterator> Seg;
> if (Seg::segment(first) == Seg::segment(last)) {
> fill(Seg::local(first), Seg::local(last), v);
> } else {
> // omitted
> }
> }
>
> But that nested fill has a different ForwardIterator, and thus
> introduces a new requirement that bubbles outward: T must be
> assignable to LocalIterator::reference.
>
> So the only way that segmented iterators stay a transparent
> optimization (that is, can be used in the standard algorithms
> without introducing additional requirements on the caller) is if the
> LocalIterator's reference guarantees that anything assignable to
> ForwardIterator's reference is also assignable to itself, which is
> pretty much the same as saying that the two have to be the same. Not
> quite, but almost.

I don't understand what you're talking about in that section above,
but I think you're making things too complicated because you're
fixated on the idea of a heterogeneous sequence. Here's a simple
extension to the hierarchical algorithm formulation that I think could
accomodate SIMD. The key change is the segment_fill function, which
gives us an additional customization point with which to capture SIMD
segments

  template<class SegIter,class T>
  void fill(SegIter first, SegIter last, const T& x, true_type)
  {

      typedef segmented_iterator_traits<SegIter> traits;

      typename traits::segment_iterator sfirst = traits::segment(first);
      typename traits::segment_iterator slast = traits::segment(last);

      if(sfirst==slast)
          fill(traits::local(first),traits::local(last), x);
      else
      {
          fill(traits::local(first), traits::end(sfirst), x);

          for(++sfirst; sfirst!=slast; ++sfirst)
              segment_fill(sfirst, x, traits());

          fill(traits::begin(sfirst), traits::local(last), x);
      }
  }

  template <class SegmentForwardIter, class T, class Traits>
  void segment_fill(
      SegmentForwardIterator s, T const& x, Traits traits)
  {
      fill(traits.begin(s), traits.end(s), x);
  }

  template <class ForwardIter,class T>
  void
  fill(ForwardIter first,ForwardIter last,const T&x, false_type)
  {
      for(; first!=last; ++first)
          *first = x;
  }

  template <class Iter,class T>
  inline void
  fill(Iter first,Iter last,const T&x)
  {
      typedef segmented_iterator_traits<Iter> Traits;

      fill(first,last,x,
           typename Traits::is_segmented_iterator());
  }

  template <class U, class T, class Traits>
  void segment_fill(
      my_SIMD_iterator<U> s, T const& x, Traits traits)
  {
      // Do SIMD stuff on an entire segment
  }
                  
Naturally, there are some things you'd want to rework about this. For
example, you'd probably want to rethink the way the segmented traits
are formulated and how dispatching to segment_fill worked. But
Austern's paper gives the bones of a solution, I think.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk