Boost logo

Boost :

Subject: Re: [boost] interest in structure of arrays container?
From: Michael Marcin (mike.marcin_at_[hidden])
Date: 2016-10-18 23:24:17


On 10/18/2016 10:55 AM, Andreas Schäfer wrote:
> On 10:29 Tue 18 Oct , Larry Evans wrote:
>> The purpose of item:
>>
>> * sizeof...(Ts) allocations could be a single large block
>>
>> is to just require 1 heap allocation instead of N, where N
>> is the number of vectors in soa<T1,T2,...,TN>?
>
> One benefit of this would be that transferring such a container to
> another address space (think MPI or CUDA) would become much simple.
>

It also reduces the size of your handle structure (the structure the
holds the pointer to the data). Otherwise every additionally member adds
~24 bytes for a simple tuple< vector<Ts>... > or sizeof(T*) bytes at a
minimum for separately allocated blocks.

It also can reduce the size of an iterator to a view of the data or
remove an indirection from it depending on implementation.

It has a nice benefit that the size of the handle + body (alloc'd data
block) for a soa_vector would be identical to that of a normal AoS
vector containing the same data.

If a solution ticked all the other boxes and dropped this one, I'd be
fine with that. There's quite a bit of complexity involved with
calculating the offsets with a dynamic capacity and potentially
arbitrary alignment requirements on internal subarrays.

I'm also not sure how to reconcile it with another frequent SoA
optimization. Replace bools or small enums/ints with an array of bit
packed data.

For example the alive bool in the particle_t elsewhere in this thread
could be stored as a bit_vector which can be a size and speed win.

As seen in the soa_emitter_sse_opt_t addition here:
http://codepad.org/eol6auRN

AoS in 5.8485 seconds
SoA in 4.06838 seconds
SoA flat in 3.99157 seconds
SoA Static in 5.26953 seconds
SoA SSE in 3.53028 seconds
SoA SSE opt in 2.98845 seconds

P.S. this also shows an improved soa_emitter_t which generates much
better code (vs2015) when vector.data() is cached for each member before
the loop. Improves SoA update from 191 instructions to 52 instructions
and roughly 2x speedup for small (~25k) datasets, but is still more than
the 36 instructions required for AoS update.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk