Boost logo

Boost :

Subject: Re: [boost] interest in structure of arrays container?
From: Michael Marcin (mike.marcin_at_[hidden])
Date: 2016-10-26 03:27:22


On 10/26/2016 12:58 AM, Larry Evans wrote:
> On 10/25/2016 11:07 PM, Michael Marcin wrote:
>> On 10/25/2016 8:23 PM, Larry Evans wrote:
>>>>
>>>> At the very least support for the basic SSE 16 byte alignment of
>>>> subarrays is crucial.
>>>>
>>>>
>>>> My best idea so far is some magic wrapper type that gets special
>>>> treatment. Like:
>>>> using data_t = soa_block< float3, soa_align<float,16>, bool >;
>>>
>>> Something like:
>>>
>>> template<typename T, std::size_t Alignment>
>>> struct alignas(Alignment) soa_align {
>>> T data;
>>> };
>>>
>>> Have you tried that yet. If not, I might try.
>>>
>>
>> The issue is you don't want to overalign all elements of the array, just
>> the first element.
>>
>
> But aligning the first soa_align<T,A> is all that's needed because
> sizeof(soa_align<T,A>)%A == 0, hence, all subsequent elements would
> be aligned. At least that's my understanding. Am I missing something?
>

Perhaps I'm misunderstanding.
Using your struct above:
std::array< soa_align<float, 16>, 4 > data;
std::cout << "align array: " << alignof(decltype(data)) << '\n'
     << "size element: " << sizeof( data[0] ) << '\n'
     << "size array: " << sizeof( data ) << '\n'
     << "offset[1]: " << (char*)&(data[1]) - (char*)data.data() << '\n';

align array: 16
size element: 16
size array: 64
offset[1]: 16

For data to work with SSE instructions this needs to report:

align array: 16
size element: 4
size array: 16
offset[1]: 4

i.e. 4 floats have to be contiguous in memory, and the *first* float has
to be aligned to 16 bytes.

>> I have a working solution (using Peter Dimov's mp11 library as I'm not
>> well-versed in post cpp03 metaprogramming).
>>
>> I'm just trying to play around with implementation ideas at the moment.
>>
>> Basically it'd be a nice to store only a single pointer and cheap
>> constant time member sub-array access.
>>
>> But with alignment concerns all I've managed so far are two
>> implementations.
>>
>> 1. 1 pointer with linear time member array access
>> 2. n-pointers with constant time member array access
>>
>> I feel like there should exist implementation that trades a bit of
>> dynamic allocation size for a single pointer and constant time member
>> array access.
>>
>
> I intended soa_block to fill that need (after all the tasks
> shown in the **TODO** comments were done). If you see some
> flaw in the code, of course, I love to hear about it.
> **TODO**
>

IIRC it had implemented roughly the #2 strategy, storing a pointer + an
array of n+1 offsets to access n members in constant time.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk