Boost logo

Boost :

Subject: Re: [boost] interest in structure of arrays container?
From: Larry Evans (cppljevans_at_[hidden])
Date: 2016-10-31 13:51:33


On 10/31/2016 12:00 PM, Michael Marcin wrote:
> On 10/31/2016 9:14 AM, Larry Evans wrote:
>>
>> However, I was still getting the 'double free' error message; hence,
>> I tried val_grind. It showed a problem in the alive update loop.
>> When the code was changed to:
>>
>> uint64_t *block_ptr = alive.data();
>> auto e_ptr = energy.data();
>> for ( size_t i = 0; i < n; ) {
>> #define REVISED_CODE
>> #ifdef REVISED_CODE
>> auto e_i = e_ptr + i;
>> #endif
>> uint64_t block = 0;
>> do {
>> #ifndef REVISED_CODE
>> //this code causes valgrind to show errors.
>> auto e_i = e_ptr + i;
>> #endif
>> _mm_store_ps( e_i, _mm_sub_ps( _mm_load_ps( e_i ), t ));
>> block |=
>> uint64_t
>> ( _mm_movemask_ps( _mm_cmple_ps( _mm_load_ps( e_i ),
>> zero )))
>> << (i % bits_per_uint64_t)
>> ;
>> i += 4;
>> } while ( i % bits_per_uint64_t != 0 );
>> *block_ptr++ = block;
>> }
>>
>> valgrind reported no errors; however, when !defined(REVISED_CODE),
>> valgrind reported:
>>
>> valgrind --tool=memcheck
>> /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
>>
>>
>> ==7937== Memcheck, a memory error detector
>> ==7937== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
>> ==7937== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
>> info
>> ==7937== Command:
>> /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
>>
>>
>> ==7937==
>> COMPILE_OPTIM=0
>> particle_count=1,000
>
>
> particle_count=1,000 is not a multiple of 64, the optimized energe/alive
> loop processes 64 particles at a time. I haven't bothered to analyze
> what the code will do in this case but memory corruption is likely

I see. However, still, since the calls to the _mm_* functions
in the previous loop all are called with i%4==0 (because
the i increment is i+=4) here:

https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L934

shouldn't the same apply to the alive loop call here:

https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L956

and putting the e_i assignment outside the alive loop here:

https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L953

assures that.

>
> The code to handle a tail (if particle_count % 64 != 0) isn't difficult
> to add but it is explicitly left out. One of the things you'll often do
> in a system such as this is fit the data to optimize the algorithm. In
> the case of a particle system plus or minus 0 to 63 particles is
> generally unnoticeable.
>
> You can address the problem however you like but the simplest solution
> would be to change your small particle count to 16 * 64 = 1024.

OK. I've done that.
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk