Boost logo

Boost :

Subject: Re: [boost] interest in structure of arrays container?
From: Michael Marcin (mike.marcin_at_[hidden])
Date: 2016-10-31 13:00:07


On 10/31/2016 9:14 AM, Larry Evans wrote:
>
> However, I was still getting the 'double free' error message; hence,
> I tried val_grind. It showed a problem in the alive update loop.
> When the code was changed to:
>
> uint64_t *block_ptr = alive.data();
> auto e_ptr = energy.data();
> for ( size_t i = 0; i < n; ) {
> #define REVISED_CODE
> #ifdef REVISED_CODE
> auto e_i = e_ptr + i;
> #endif
> uint64_t block = 0;
> do {
> #ifndef REVISED_CODE
> //this code causes valgrind to show errors.
> auto e_i = e_ptr + i;
> #endif
> _mm_store_ps( e_i, _mm_sub_ps( _mm_load_ps( e_i ), t ));
> block |=
> uint64_t
> ( _mm_movemask_ps( _mm_cmple_ps( _mm_load_ps( e_i ),
> zero )))
> << (i % bits_per_uint64_t)
> ;
> i += 4;
> } while ( i % bits_per_uint64_t != 0 );
> *block_ptr++ = block;
> }
>
> valgrind reported no errors; however, when !defined(REVISED_CODE),
> valgrind reported:
>
> valgrind --tool=memcheck
> /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
>
> ==7937== Memcheck, a memory error detector
> ==7937== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==7937== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
> ==7937== Command:
> /tmp/build/clangxx3_8_pkg/clang/struct_of_arrays/work/soa_compare.benchmark.optim0.exe
>
> ==7937==
> COMPILE_OPTIM=0
> particle_count=1,000

particle_count=1,000 is not a multiple of 64, the optimized energe/alive
loop processes 64 particles at a time. I haven't bothered to analyze
what the code will do in this case but memory corruption is likely

The code to handle a tail (if particle_count % 64 != 0) isn't difficult
to add but it is explicitly left out. One of the things you'll often do
in a system such as this is fit the data to optimize the algorithm. In
the case of a particle system plus or minus 0 to 63 particles is
generally unnoticeable.

You can address the problem however you like but the simplest solution
would be to change your small particle count to 16 * 64 = 1024.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk