Boost logo

Boost :

Subject: Re: [boost] interest in structure of arrays container?
From: Chris Glover (c.d.glover_at_[hidden])
Date: 2016-10-26 14:33:48


>
> I guess some optimisation from way yonder (something modern compilers do
> routinely, even on a Monday morning!)... but more than probable irrelevant
> nowadays...
>
> degski
>

I might be pessimistic, but I never trust the compiler and generally check
what's being output. In this case, FWIW, on MSVC2015, the bit-twiddling
version generates faster code than the mod version -- about 25% faster. I
didn't test gcc or clang.

Using google benchmark:

Code:

static void AlignedMod(benchmark::State& state)
{
    while (state.KeepRunning())
    {
        for(int i = state.range_x(); i < 128; i += state.range_y())
        {
            bool aligned = (i % 16) == 0;
            benchmark::DoNotOptimize(aligned);
        }
    }
}
BENCHMARK(AlignedMod)->ArgPair(1, 1);

static void AlignedAnd(benchmark::State& state)
{
    while (state.KeepRunning())
    {
        for(int i = state.range_x(); i < 128; i += state.range_y())
        {
            bool aligned = ((i - 1) & 15) == 0;
            benchmark::DoNotOptimize(aligned);
        }
    }
}
BENCHMARK(AlignedAnd)->ArgPair(1, 1);

Generated code of the inner loop:

Mod version:
mov eax,ebx
and eax,8000000Fh
jge AlignedMod+50h
dec eax
or eax,0FFFFFFF0h
inc eax
test eax,eax
lea rcx,[aligned]
sete byte ptr [aligned]
call 07FF73B84A180h
add ebx,dword ptr [rdi+1Ch]
cmp ebx,80h
jl AlignedMod+40h

And version:
lea eax,[rbx-1]
test al,0Fh
lea rcx,[aligned]
sete byte ptr [aligned]
call 07FF73B84A180h
add ebx,dword ptr [rdi+1Ch]
cmp ebx,80h
jl AlignedAnd+40h

Result:
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
AlignedMod/1/1 204 ns 203 ns 4072727
AlignedAnd/1/1 153 ns 154 ns 4977778

-- chris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk