Boost logo

Boost :

Subject: Re: [boost] Boost SIMD beta release
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2012-12-25 08:38:15


On 24/12/12 17:16, Domagoj Saric wrote:
> "Peter Dimov" wrote in message
> news:EC9066EE5B5448A2B4BAF64570BE6BAD_at_pdimov5...
>> Yes, and the right thing to do is...
>
> Actually there is a "significant"[1] portion of cases where alignment
> problems can be fixed more elegantly. If we realize that protected
> memory systems have page-size-granularity which is much larger than any
> conceivable SIMD vector size it immediately follows that "overread"
> (reading outside of the specified range) of a maximum size of
> "SIMD-cardinal - 1" is always safe.

It is safe indeed. But the value you will read might not make sense for
what you are doing.
Consider the simple example of summing all values between two pointers.
Values beyond the last pointer should be zero, not whatever lies there
in memory.

For writing, this is IMO way too hacky, since wrong ordering of the code
could easily end up with the wrong value.

> @NT2 devs, I might have missed it but I didn't see you mention your
> shifted-iterator functionality in this context...

The shifted iterator and the shifted load allow to do aligned loads if
you statically know the misalignment of the memory.
To use shifted load to work with arbitrary alignment, you need to
generate all variants and select the good one at runtime.

In case I haven't been clear about that, for unary transform code could
look like this (untested)

template<class T>
T* transform(T const* begin, T const* end, T* out, F f)
{
   typedef native<T, BOOST_SIMD_DEFAULT_EXTENSION> vT;
   static const size_t N = meta::cardinal_of<vT>::value;

   T const* out_bnd = align_on(out, N);
   for(; begin != end && out != out_bnd; ++begin, ++out)
     *out = f(*begin);

   T const* end_bnd = begin+(end-begin)/N*N;
   size_t misalign = align_on(begin, N)-begin;
   if(begin != end_bnd)
     begin += misalign;
   switch(misalign)
   {
     #define M0(z,n,t) \
     case n: \
       for(; begin != end_bnd; begin += N, out += N) \
         store(f(load<vT, -n>(begin), out); \
       break; \
     /**/
     BOOST_PP_REPEAT(BOOST_SIMD_BYTES, M0, ~)
     #undef M0
     default:
       BOOST_ASSERT_MSG(0, "unsupported alignment in transform");
   }

   for(; begin != end; ++begin, ++out)
     *out = f(*begin);

   return out;
}

Binary transform would surely be quite more complicated.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk