Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iterators and alternatives
From: David Abrahams (dave_at_[hidden])
Date: 2010-10-13 13:50:48
At Wed, 13 Oct 2010 17:38:56 +0100,
Mathias Gaunard wrote:
> On 13/10/10 16:07, David Abrahams wrote:
> > At Wed, 13 Oct 2010 15:52:47 +0100,
> > Mathias Gaunard wrote:
> >> On 13/10/10 12:18, Mathias Gaunard wrote:
> >>> Since those could be different segments, you wouldn't need to check
> >>> which segment you are in at each iteration.
> >>> Actually, all local iterators for a a segmented iterator must be the
> >>> same for all segments, but since a local iterator can be a segmented
> >>> iterator itself, it can also have a local iterator which can be
> >>> different. By recursion, you can therefore use different iterator types
> >>> for each segment, but that's somewhat convoluted.
> >> That's wrong actually. Since all the local iterators are the same,
> >> then their local iterators must be the same too.
> >> So segmented iterators wouldn't help to only check for alignment once
> >> it has been reached I think. I would have to properly try to implement
> >> one, but it's not particularly practical to do.
> > I think the exercise would be worth your time
> Well, let me rephrase the problem.
> Given the contiguous sequence of numbers 1 2 3 4 5 6 7 8 9 10 11, with
> the 3rd element (3) lying on a satisfying alignment boundary to deploy
> SIMD instructions of cardinal 4, we want to adapt the range into this:
> | 0 0 1 2 | 3 4 5 6 | 7 8 9 10 | 11 0 0 0 |
I think you mean
1 2 | 3 4 5 6 | 7 8 9 10 | 11
right? Are you really intending to manufacture zeros for padding?
> The concern is that we want the iteration of the middle part | 3 4 5 6
> | 7 8 9 10 | (which in practice would contain much more elements than
> this), to be as fast as possible, i.e.
> - increment is a ptr += 4
> - dereference is load ptr into a SIMD register
> This is not possible using normal iterators, since I will need to have
> a different behaviour (and thus a branch), for at least one of those
> two functions, depending on what state the iteration is at.
> As a result a few bytes at the beginning and at the end of a long
> sequence slow down the whole iteration.
Yes, this is exactly the same problem as std::deque has, which
segmented iterators were designed for.
> I thought segmented iterators would solve this, since after all I've
> got three segments, and you've been repeatedly saying on this list
> that it is the solution to many iterator problems, but they
Of course they do.
> Segmented iterators are only useful if the iteration of all my
> segments is done with the exact same logic but at different locations,
> since they all must have the same type. They don't allow to circumvent
> the problem I exposed with normal iterators at all.
I think they do. However, you might find that you need some dynamic
dispatching to get it to work. In other words,
if (operating on complete segment)
> My alternative not only does, but it could also allows not to have to
> pad the ranges with zeroes, which is potentially more interesting to
> write SIMD-aware algorithms.
You don't need to pad with zeroes for segmented iterators.
> >> Anyway, my belief is that my alternative to segmented iterators is
> >> more interesting.
> > Not to me, until you can begin to address something like std::merge().
> Yet you are convinced about segmented iterator without even knowing if
> it can address something like std::merge (well sure it can if you use
> the fallback flat iterator, but that's not what I mean).
I do know that it can address std::merge. It's not necessarily
pretty, though. ;-)
-- Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk