Boost logo

Boost :

Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2010-10-18 16:43:25


On 18.10.2010, at 16:34, David Abrahams wrote:

>
> No, you need different code for iterating the end bits. Look
> carefully at the hierarchical_fill implementation near the end of the
> paper.

Segmented iterators as described in the paper have two limitations, as I see it:
1) The value type of the flat iterator and the local iterator have to be the same.
2) Segments must be uniform.

(2) means that you can't have one segment have one type with one kind of iterator, and the next another type with another kind of iterator. Case in point, joint_iterator over a fusion sequence of containers. You can't represent those with a segmented iterator, because the segments are different container types and thus you cannot have a segment iterator, since it wouldn't have a single value type.
It occurs to me that there probably would be a way to bring compile-time fixed-size heterogenous iterators (well, fusion sequences) into a reusable concept world here too, but that's a lot of additional work on top of segmented iterators.
(2) also means that the local iterators for all segments are the same, and therefore that the final value type of every segment has to be the same.

(1) is not immediately obvious. The segmented_iterator_traits do not actually make this requirement. However, it is a consequence of the intended use of segmented iterators: transparent efficiency increase for the standard algorithms. fill() is a case in point. The flat version looks like this:

template <typename ForwardIterator, typename T>
void fill(ForwardIterator first, ForwardIterator last, const T &v) {
  for (; first != last; ++first) {
    *first = v;
  }
}

Requirements: ForwardIterator is a forward iterator, and T is assignable to ForwardIterator::reference.

If ForwardIterator was additionally a segmented iterator, the complicated hierarchical fill should be transparently invoked. This comes down to this (implementation switching omitted):

template <typename SegIterator, typename T>
void fill(SegIterator first, SegIterator last, const T &v) {
  typedef segmented_iterator_traits<SegIterator> Seg;
  if (Seg::segment(first) == Seg::segment(last)) {
    fill(Seg::local(first), Seg::local(last), v);
  } else {
    // omitted
  }
}

But that nested fill has a different ForwardIterator, and thus introduces a new requirement that bubbles outward: T must be assignable to LocalIterator::reference.
So the only way that segmented iterators stay a transparent optimization (that is, can be used in the standard algorithms without introducing additional requirements on the caller) is if the LocalIterator's reference guarantees that anything assignable to ForwardIterator's reference is also assignable to itself, which is pretty much the same as saying that the two have to be the same. Not quite, but almost.

Here's the problem when trying to use the segmented iterator idea to present a sequence of scalars as a sequence of SIMD packs. You can try to divide the sequence into three segments: leading unaligned values, aligned packs, and trailing unaligned. However, this runs afoul of (2): the segments are heterogenous. The first and third iterate over single values, the center iterates over packs. The abstraction doesn't support it.
Alternatively, you could divide the sequence into N segments. The first are the leading unaligned values, segments two through N-1 are fixed-size segments of as many values as fit in a SIMD register, and segment N are the trailing unaligned values. But then you are not actually dealing in SIMD packs. Instead, you're dealing in very short loops that a compiler might find easier to auto-vectorize or at least unroll, but which will completely destroy your performance if it doesn't.
Finally, you can transform the entire sequence into a sequence of SIMD packs, but that's not segmented anymore. (Also, the implementation is incredibly ugly.) If you try to misuse the segmented iterator concept for this, (1) will bite you.

Sebastian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk