# Boost :

Subject: Re: [boost] Boost SIMD beta release
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2012-12-23 07:41:18

On 21/12/12 19:48, Dave Abrahams wrote:

> Don't worry, I hear you loud and clear saying that it doesn't work. You
> haven't, though, done anything to help me understand why you're saying
> that.

I have done so repeatedly (this is not the first time we're discussing
this), yet you don't seem to understand what we're actually doing and
what is needed.

There are both semantical and technical problems with trying to use
segmented iterators, both of which I already explained to you.

Let me repeat everything I've already told you in one single message.
This will be my last attempt to explain it to you.
It is already quite ironic that I spent more time explaining to you why
segmented iterators do not bring anything to help with this problem than
the time it took me to figure it out.

SIMD are special instructions that allow to perform the same operation
on a vector of N values. For simplicity's sake, we'll ignore alignment
concerns in what follows, since it's only something to need to take care
of with if you want to do additional optimizations.

If you want to perform a function like plus, on n elements, and that n
is not necessarily a multiple of N, you can do a certain number of
operations with vectors of N elements and then you need to take care of
the trailing data that doesn't fully fill a vector differently.
If you want to add two sequences of 10 floats each, storing the result
in another third sequence, and you have 4xfloat vectors, you do 4
4xfloat loads, 2 4xfloat additions, 2 4xfloat stores and then 4 normal
1xfloat loads, 2 1xfloat additions and 2 1xfloat stores (which are all
different functions than their 4xfloat counterparts).

Question: would iterating the sequence with a segmented iterator allow
us to do this?

Semantical problem: an iterator, be it segmented or otherwise, must
provide the same element type for all its elements (i.e. sequences are
homogeneous). That means we cannot have a 4xfloat body and a trailing
1xfloat epilogue. We need to have 4xfloat everywhere, which, while
possible in some cases, would be limiting in others, needing a
case-by-case workaround depending on the operation being done (like
conditional stores or filling with neutral data).

Technical problem: for each segment, the type of the iterator needs to
be the same. The code for incrementing or dereferencing the iterator is
therefore the same.
This gives us two solutions:
- Do a branch in the dereferencing function of the iterator to load
normal data and trailing data differently (at every iteration, not ok --
this approach also works with normal iterators)
- Pre-emptively build the last vector of the data and copy it into the
iterator (which implies unnecessary loads, stores and cache misses and
makes the iterator expensive to copy), and consider that as an

Answer: Not in a very satisfying manner. The added generality of those
concepts does not justify the runtime overhead and the fact that each
user would still need to deal with how to reduce trailing data to a vector.

> I don't know what you mean.

Since you don't know, why do you insist on saying that it is "obvious"?

>> Polymorphic algorithms ended up being a significantly more interesting
>> approach.
>
> Nor here.

In this instance, polymorphic algorithms refer to algorithms that call a
polymorphic function object with different argument types depending on
which part of the sequence is being processed.

> I don't have time to write the code out ATM. It seems to me that once
> you impose a hierarchical "SIMD" view over the sequence, you can
> specialize the algorithm for the aligned part, to use SIMD instructions.
> Maybe I'm overlooking something, but so far you haven't been very
> helpful in the understanding-what-I'm-missing department.

SIMD algorithms and segmented iterator algorithms were done for
different purposes.
Lifting them to the same general algorithm is just a bad idea.