# Boost :

Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-10-13 12:38:56

On 13/10/10 16:07, David Abrahams wrote:
> At Wed, 13 Oct 2010 15:52:47 +0100,
> Mathias Gaunard wrote:
>>
>> On 13/10/10 12:18, Mathias Gaunard wrote:
>>
>>> Since those could be different segments, you wouldn't need to check
>>> which segment you are in at each iteration.
>>> Actually, all local iterators for a a segmented iterator must be the
>>> same for all segments, but since a local iterator can be a segmented
>>> iterator itself, it can also have a local iterator which can be
>>> different. By recursion, you can therefore use different iterator types
>>> for each segment, but that's somewhat convoluted.
>>
>> That's wrong actually. Since all the local iterators are the same,
>> then their local iterators must be the same too.
>>
>> So segmented iterators wouldn't help to only check for alignment once
>> it has been reached I think. I would have to properly try to implement
>> one, but it's not particularly practical to do.
>
> I think the exercise would be worth your time

Well, let me rephrase the problem.

Given the contiguous sequence of numbers 1 2 3 4 5 6 7 8 9 10 11, with
the 3rd element (3) lying on a satisfying alignment boundary to deploy
SIMD instructions of cardinal 4, we want to adapt the range into this:
| 0 0 1 2 | 3 4 5 6 | 7 8 9 10 | 11 0 0 0 |

The concern is that we want the iteration of the middle part | 3 4 5 6 |
7 8 9 10 | (which in practice would contain much more elements than
this), to be as fast as possible, i.e.
- increment is a ptr += 4
- dereference is load ptr into a SIMD register

This is not possible using normal iterators, since I will need to have a
different behaviour (and thus a branch), for at least one of those two
functions, depending on what state the iteration is at.
As a result a few bytes at the beginning and at the end of a long
sequence slow down the whole iteration.

I thought segmented iterators would solve this, since after all I've got
three segments, and you've been repeatedly saying on this list that it
is the solution to many iterator problems, but they don't. Segmented
iterators are only useful if the iteration of all my segments is done
with the exact same logic but at different locations, since they all
must have the same type. They don't allow to circumvent the problem I
exposed with normal iterators at all.

My alternative not only does, but it could also allows not to have to
pad the ranges with zeroes, which is potentially more interesting to
write SIMD-aware algorithms.

>
>> Anyway, my belief is that my alternative to segmented iterators is
>> more interesting.
>
> Not to me, until you can begin to address something like std::merge().

Yet you are convinced about segmented iterator without even knowing if
it can address something like std::merge (well sure it can if you use
the fallback flat iterator, but that's not what I mean).