|
Boost : |
Subject: Re: [boost] interest in structure of arrays container?
From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2016-10-26 04:23:02
On 22:13 Tue 25 Oct , Michael Marcin wrote:
> On 10/25/2016 12:22 PM, Larry Evans wrote:
> >
> > Hmmm. I didn't realize you'd have to run the benchmark
> > several times to get stable results. I guess that reflect
> > my ignorance of how benchmarks should be run.
>
> The code was just a quick example hacked up to show large difference
> between different techniques.
>
> If you want to compare similar techniques you'll need a more robust
> benchmark.
>
> It would be easy to convert it to use:
> https://github.com/google/benchmark
>
> Which is quite good.
When doing performance measurements you have to take into account the
most common sources of noise:
1. Other processes might eat up CPU time or memory bandwidth.
2. The OS might decide to move your benchmark from one core to
another, so you're loosing all L1+L2 cache entries. (Solution:
thread pinning)
3. Thermal conditions and thermal inertia may affect if/when the CPU
increases its clock speed. (Solution: either disable turbo mode or
run the benchmark long enough to even out the thermal
fluctuations.)
AFAIK Google Benchmark doesn't to thread pinning and cannot affect the
turbo mode. LIKWID ( https://github.com/RRZE-HPC/likwid ) can be used
to set clock frequencies and pin threads, and can read the performance
counters of the CPU. Might be a good idea to use both, Google
Benchmark and LIKWID together.
> > Could you explain how running a couple of times achieves
> > stable results (actually, on some occassions, I've run the
> > benchmark and got results completely unexpected, I suspect
> > it was because some application deamon was stealing cycles
> > from the benchmark, leading to the unexpedted results).
> >
> >> Interestingly your SSE code is ~13% faster than the
> >> LibFlatArray code for large particle counts.
> >
> > Actually, the SSE code was the OP's.
> >
>
> Actually it originates from:
>
> https://software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions
Ah, thanks for the info.
> > From the above, the LibFlatArray and SSE methods are the
> > fastest. I'd guess that a new "SoA block SSE" method, which
> > uses the _mm_* methods, would narrow the difference. I'll
> > try to figure out how to do that. I notice:
> >
> > #include <mmintrin.h>
> >
> > doesn't produce a compile error; however, that #include
> > doesn't have the _mm_add_ps used here:
> >
> > https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L621
> >
> >
> > Do you know of some package I could install on my ubuntu OS
> > that makes those SSE functions, such as _mm_add_ps,
> > available?
> >
> > [snip]
>
> If you're using gcc I think the header <xmmintrin.h>
The header should not depend on the compiler, but on the CPU model. Or
rather: the vector ISA supported by the CPU:
http://stackoverflow.com/questions/11228855/header-files-for-simd-intrinsics
Cheers
-Andreas
-- ========================================================== Andreas Schäfer HPC and Supercomputing Institute for Multiscale Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany +49 9131 85-20866 PGP/GPG key via keyserver http://www.libgeodecomp.org ========================================================== (\___/) (+'.'+) (")_(") This is Bunny. Copy and paste Bunny into your signature to help him gain world domination!
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk