Boost logo

Boost :

Subject: Re: [boost] interest in structure of arrays container?
From: Larry Evans (cppljevans_at_[hidden])
Date: 2016-11-06 11:36:01


On 10/31/2016 12:00 PM, Michael Marcin wrote:
> On 10/31/2016 9:14 AM, Larry Evans wrote:
>>
>> However, I was still getting the 'double free' error message; hence,
>> I tried val_grind. It showed a problem in the alive update loop.
>> When the code was changed to:
>>
>> uint64_t *block_ptr = alive.data();
>> auto e_ptr = energy.data();
>> for ( size_t i = 0; i < n; ) {
>> #define REVISED_CODE
>> #ifdef REVISED_CODE
>> auto e_i = e_ptr + i;
>> #endif
>> uint64_t block = 0;
>> do {
>> #ifndef REVISED_CODE
>> //this code causes valgrind to show errors.
>> auto e_i = e_ptr + i;
>> #endif
>> _mm_store_ps( e_i, _mm_sub_ps( _mm_load_ps( e_i ), t ));
>> block |=
>> uint64_t
>> ( _mm_movemask_ps( _mm_cmple_ps( _mm_load_ps( e_i ),
>> zero )))
>> << (i % bits_per_uint64_t)
>> ;
>> i += 4;
>> } while ( i % bits_per_uint64_t != 0 );
>> *block_ptr++ = block;
>> }
>>
>> valgrind reported no errors; however, when !defined(REVISED_CODE),
>> valgrind reported:
>>
>> valgrind --tool=memcheck
>> /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
>>
>>
>> ==7937== Memcheck, a memory error detector
>> ==7937== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
>> ==7937== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
>> info
>> ==7937== Command:
>> /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
>>
>>
>> ==7937==
>> COMPILE_OPTIM=0
>> particle_count=1,000
>
>
> particle_count=1,000 is not a multiple of 64, the optimized energe/alive
> loop processes 64 particles at a time. I haven't bothered to analyze
> what the code will do in this case but memory corruption is likely
>
> The code to handle a tail (if particle_count % 64 != 0) isn't difficult
> to add but it is explicitly left out. One of the things you'll often do
> in a system such as this is fit the data to optimize the algorithm. In
> the case of a particle system plus or minus 0 to 63 particles is
> generally unnoticeable.
>
> You can address the problem however you like but the simplest solution
> would be to change your small particle count to 16 * 64 = 1024.
>
IIUC, the 64 constraint is because bit_vector stores the bit's in 64
increments (or bitsof(uint_64_t) where bitsof is defined here:

   https://github.com/cppljevans/soa/blob/master/bitsof.hpp

However, with the following SSE_opt alive update loop:

 
https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L961

I think the constraint could be relaxed to a multiple of sse_width(=4).
That's because the additional test:

    i_array < n_array

here:

 
https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L978

prevents:

    auto e_i = e_ptr + i_array;

here:

 
https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L985

from accessing outside of the array.
Does that make sense?

-regards,
Larry


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk