Boost logo

Boost :

Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: David Abrahams (dave_at_[hidden])
Date: 2010-10-18 20:15:24

At Mon, 18 Oct 2010 22:43:25 +0200,
Sebastian Redl wrote:
> On 18.10.2010, at 16:34, David Abrahams wrote:
> >
> > No, you need different code for iterating the end bits. Look
> > carefully at the hierarchical_fill implementation near the end of the
> > paper.
> Segmented iterators as described in the paper have two limitations, as I see it:
> 1) The value type of the flat iterator and the local iterator have
> to be the same.

Which is totally fine in this case

> 2) Segments must be uniform.
> (2) means that you can't have one segment have one type with one
> kind of iterator, and the next another type with another kind of
> iterator. Case in point, joint_iterator over a fusion sequence of
> containers.

Also totally fine in this case. If you want to operate on a
heterogeneous sequence, you'd need fusion-y segmented iterators of
course, so you'd make the obvious corresponding extensions of fusion
iterator concepts. But you don't need it for SIMD, AFAICT

> You can't represent those with a segmented iterator, because the
> segments are different container types and thus you cannot have a
> segment iterator, since it wouldn't have a single value type. It
> occurs to me that there probably would be a way to bring
> compile-time fixed-size heterogenous iterators (well, fusion
> sequences) into a reusable concept world here too,

They're already in that world, IIUC.

> but that's a lot of additional work on top of segmented iterators.
> (2) also means that the local iterators for all segments are the
> same, and therefore that the final value type of every segment has
> to be the same.

Also not a problem.

> (1) is not immediately obvious. The segmented_iterator_traits do not
> actually make this requirement. However, it is a consequence of the
> intended use of segmented iterators: transparent efficiency increase
> for the standard algorithms. fill() is a case in point. The flat
> version looks like this:
> template <typename ForwardIterator, typename T>
> void fill(ForwardIterator first, ForwardIterator last, const T &v) {
> for (; first != last; ++first) {
> *first = v;
> }
> }
> Requirements: ForwardIterator is a forward iterator, and T is
> assignable to ForwardIterator::reference.
> If ForwardIterator was additionally a segmented iterator, the
> complicated hierarchical fill should be transparently invoked. This
> comes down to this (implementation switching omitted):
> template <typename SegIterator, typename T>
> void fill(SegIterator first, SegIterator last, const T &v) {
> typedef segmented_iterator_traits<SegIterator> Seg;
> if (Seg::segment(first) == Seg::segment(last)) {
> fill(Seg::local(first), Seg::local(last), v);
> } else {
> // omitted
> }
> }
> But that nested fill has a different ForwardIterator, and thus
> introduces a new requirement that bubbles outward: T must be
> assignable to LocalIterator::reference.
> So the only way that segmented iterators stay a transparent
> optimization (that is, can be used in the standard algorithms
> without introducing additional requirements on the caller) is if the
> LocalIterator's reference guarantees that anything assignable to
> ForwardIterator's reference is also assignable to itself, which is
> pretty much the same as saying that the two have to be the same. Not
> quite, but almost.

I don't understand what you're talking about in that section above,
but I think you're making things too complicated because you're
fixated on the idea of a heterogeneous sequence. Here's a simple
extension to the hierarchical algorithm formulation that I think could
accomodate SIMD. The key change is the segment_fill function, which
gives us an additional customization point with which to capture SIMD

  template<class SegIter,class T>
  void fill(SegIter first, SegIter last, const T& x, true_type)

      typedef segmented_iterator_traits<SegIter> traits;

      typename traits::segment_iterator sfirst = traits::segment(first);
      typename traits::segment_iterator slast = traits::segment(last);

          fill(traits::local(first),traits::local(last), x);
          fill(traits::local(first), traits::end(sfirst), x);

          for(++sfirst; sfirst!=slast; ++sfirst)
              segment_fill(sfirst, x, traits());

          fill(traits::begin(sfirst), traits::local(last), x);

  template <class SegmentForwardIter, class T, class Traits>
  void segment_fill(
      SegmentForwardIterator s, T const& x, Traits traits)
      fill(traits.begin(s), traits.end(s), x);

  template <class ForwardIter,class T>
  fill(ForwardIter first,ForwardIter last,const T&x, false_type)
      for(; first!=last; ++first)
          *first = x;

  template <class Iter,class T>
  inline void
  fill(Iter first,Iter last,const T&x)
      typedef segmented_iterator_traits<Iter> Traits;

           typename Traits::is_segmented_iterator());

  template <class U, class T, class Traits>
  void segment_fill(
      my_SIMD_iterator<U> s, T const& x, Traits traits)
      // Do SIMD stuff on an entire segment
Naturally, there are some things you'd want to rework about this. For
example, you'd probably want to rethink the way the segmented traits
are formulated and how dispatching to segment_fill worked. But
Austern's paper gives the bones of a solution, I think.

Dave Abrahams
BoostPro Computing

Boost list run by bdawes at, gregod at, cpdaniel at, john at