# Boost :

Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: David Abrahams (dave_at_[hidden])
Date: 2010-10-13 13:50:48

At Wed, 13 Oct 2010 17:38:56 +0100,
Mathias Gaunard wrote:
>
> On 13/10/10 16:07, David Abrahams wrote:
> > At Wed, 13 Oct 2010 15:52:47 +0100,
> > Mathias Gaunard wrote:
> >>
> >> On 13/10/10 12:18, Mathias Gaunard wrote:
> >>
> >>> Since those could be different segments, you wouldn't need to check
> >>> which segment you are in at each iteration.
> >>> Actually, all local iterators for a a segmented iterator must be the
> >>> same for all segments, but since a local iterator can be a segmented
> >>> iterator itself, it can also have a local iterator which can be
> >>> different. By recursion, you can therefore use different iterator types
> >>> for each segment, but that's somewhat convoluted.
> >>
> >> That's wrong actually. Since all the local iterators are the same,
> >> then their local iterators must be the same too.
> >>
> >> So segmented iterators wouldn't help to only check for alignment once
> >> it has been reached I think. I would have to properly try to implement
> >> one, but it's not particularly practical to do.
> >
> > I think the exercise would be worth your time
>
> Well, let me rephrase the problem.
>
> Given the contiguous sequence of numbers 1 2 3 4 5 6 7 8 9 10 11, with
> the 3rd element (3) lying on a satisfying alignment boundary to deploy
> SIMD instructions of cardinal 4, we want to adapt the range into this:
> | 0 0 1 2 | 3 4 5 6 | 7 8 9 10 | 11 0 0 0 |

I think you mean

1 2 | 3 4 5 6 | 7 8 9 10 | 11

right? Are you really intending to manufacture zeros for padding?

>
> The concern is that we want the iteration of the middle part | 3 4 5 6
> | 7 8 9 10 | (which in practice would contain much more elements than
> this), to be as fast as possible, i.e.
> - increment is a ptr += 4
> - dereference is load ptr into a SIMD register
>
> This is not possible using normal iterators, since I will need to have
> a different behaviour (and thus a branch), for at least one of those
> two functions, depending on what state the iteration is at.
> As a result a few bytes at the beginning and at the end of a long
> sequence slow down the whole iteration.

Yes, this is exactly the same problem as std::deque has, which
segmented iterators were designed for.

> I thought segmented iterators would solve this, since after all I've
> got three segments, and you've been repeatedly saying on this list
> that it is the solution to many iterator problems, but they
> don't.

Of course they do.

> Segmented iterators are only useful if the iteration of all my
> segments is done with the exact same logic but at different locations,
> since they all must have the same type. They don't allow to circumvent
> the problem I exposed with normal iterators at all.

I think they do. However, you might find that you need some dynamic
dispatching to get it to work. In other words,

if (operating on complete segment)
SIMD algorithm
else
elementwise algorithm

> My alternative not only does, but it could also allows not to have to
> pad the ranges with zeroes, which is potentially more interesting to
> write SIMD-aware algorithms.

You don't need to pad with zeroes for segmented iterators.

> >> Anyway, my belief is that my alternative to segmented iterators is
> >> more interesting.
> >
> > Not to me, until you can begin to address something like std::merge().
>
> Yet you are convinced about segmented iterator without even knowing if
> it can address something like std::merge (well sure it can if you use
> the fallback flat iterator, but that's not what I mean).

I do know that it can address std::merge. It's not necessarily
pretty, though. ;-)

```--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
```