Subject: Re: [boost] interest in structure of arrays container?
From: Michael Marcin (mike.marcin_at_[hidden])
Date: 2016-10-21 02:07:59
On 10/21/2016 12:48 AM, Michael Marcin wrote:
> On 10/20/2016 10:02 PM, Larry Evans wrote:
>> The modification added soa_emitter_block_t which uses soa_block.
>> Unfortunately, this soa_emitter_block_t takes about twice as long as
>> your soa_emitter_static_t.
>> I've no idea why. Any guesses?
> 2x is quite an abstraction penalty.
> I can only assume your compiler is failing to optimize away some part of
> the abstraction.
> FWIW on vs2015 I'm not seeing nearly as much of a difference.
> AoS in 6.34667 seconds
> SoA in 4.26384 seconds
> SoA flat in 4.16572 seconds
> SoA Static in 5.4037 seconds
> SoA block in 5.5588 seconds
I'm still trying to work out how to fit overaligned subarrays into your
The issue is that many simd instructions require more than just
subarrays of float/double/int/short/char or carefully crafted udts might
need to be aligned to as much as 64bytes in the worst case.
On the MIC architecture, vector load/store operations
must be called on 64-byte aligned memory addresses.
On the Xeon architecture with AVX/AVX2 instruction sets
(Sandy Bridge, Ivy Bridge or Haswell), alignment does not matter.
In earlier architectures (Nehalem, Westmere) alignment did matter,
but a 32-byte alignment was necessary.
At the very least support for the basic SSE 16 byte alignment of
subarrays is crucial.
My best idea so far is some magic wrapper type that gets special
using data_t = soa_block< float3, soa_align<float,16>, bool >;
This maybe opens the door for other magic types like:
using data_t = soa_block< float3, soa_align<float,16>, soa_bit >;
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk