|
Boost : |
Subject: Re: [boost] interest in structure of arrays container?
From: Chris Glover (c.d.glover_at_[hidden])
Date: 2016-10-26 14:33:48
>
> I guess some optimisation from way yonder (something modern compilers do
> routinely, even on a Monday morning!)... but more than probable irrelevant
> nowadays...
>
> degski
>
I might be pessimistic, but I never trust the compiler and generally check
what's being output. In this case, FWIW, on MSVC2015, the bit-twiddling
version generates faster code than the mod version -- about 25% faster. I
didn't test gcc or clang.
Using google benchmark:
Code:
static void AlignedMod(benchmark::State& state)
{
while (state.KeepRunning())
{
for(int i = state.range_x(); i < 128; i += state.range_y())
{
bool aligned = (i % 16) == 0;
benchmark::DoNotOptimize(aligned);
}
}
}
BENCHMARK(AlignedMod)->ArgPair(1, 1);
static void AlignedAnd(benchmark::State& state)
{
while (state.KeepRunning())
{
for(int i = state.range_x(); i < 128; i += state.range_y())
{
bool aligned = ((i - 1) & 15) == 0;
benchmark::DoNotOptimize(aligned);
}
}
}
BENCHMARK(AlignedAnd)->ArgPair(1, 1);
Generated code of the inner loop:
Mod version:
mov eax,ebx
and eax,8000000Fh
jge AlignedMod+50h
dec eax
or eax,0FFFFFFF0h
inc eax
test eax,eax
lea rcx,[aligned]
sete byte ptr [aligned]
call 07FF73B84A180h
add ebx,dword ptr [rdi+1Ch]
cmp ebx,80h
jl AlignedMod+40h
And version:
lea eax,[rbx-1]
test al,0Fh
lea rcx,[aligned]
sete byte ptr [aligned]
call 07FF73B84A180h
add ebx,dword ptr [rdi+1Ch]
cmp ebx,80h
jl AlignedAnd+40h
Result:
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
AlignedMod/1/1 204 ns 203 ns 4072727
AlignedAnd/1/1 153 ns 154 ns 4977778
-- chris
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk