Boost logo

Boost :

Subject: Re: [boost] Boost SIMD beta release
From: Peter Dimov (lists_at_[hidden])
Date: 2012-12-20 13:49:15


Joel Falcou wrote:
> Le 20/12/2012 18:34, Peter Dimov a écrit :
> > What is the recommended Boost.SIMD way to write a function like
> >
> > void add_n( float const * s, float const * s2, float * d, size_t
> > n );
> > // d[i] = s[i] + s2[i]
> >
> > where none of s, s2, d are guaranteed to be aligned?
>
> You should align them ;)
>
> More seriously, you can run a for using with pack and
> unaligned_load/store:
>
> void add_n( float const * s, float const * s2, float * d, size_t n )
> {
> size_t c = pack<float>::static_size;
> size_t vn = v / c * c;
> size_t sn = v % c;
>
> for(std::size_t i=0, i<vn; i+= c, d+=c,s+=c,s2+=c)
> store(unaligned_load<pack<T>>(s) + unaligned_load<pack<T>>(s2), d );
>
> for(std::size_t i=0, i<sn; i++,d++,s++,s2++)
> *d = *s + *s2;
> }
...
> Note that on any pre-Nehalem CPU, the unaligned load will be horrendsously
> slow.

Yes, and the right thing to do is to first check whether s and s2 are
equally unaligned, and if so, have a prefix scalar loop that aligns them; if
not, check whether s2 and d are equally unaligned, and align them; and
finally, if neither of these are true, align s. (Although I'm not quite
certain whether unaligned stores weren't costlier, in which case the order
changes a bit.) Then proceed with the rest of your code above.

This is tedious boilerplate so I wondered whether you had already provided a
solution. simd::transform seems the logical place to put it.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk