Boost logo

Boost :

Subject: Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30
From: Thorsten Ottosen (tottosen_at_[hidden])
Date: 2017-09-27 16:23:00


Den 26-09-2017 kl. 23:43 skrev Joaquin M López Muñoz via Boost:

Thanks for the thorough review.

> 3. The question arises of whether segment access can gain us some speed.
> I've
> written a small test to measure the performance of a plain std::for_each
> loop
> over a batch_deque vs. an equivalent sequence of segment-level loops
> (attached, batch_deque_for_each.cpp), and this is what I got for Visual
> C++ 2015
> 32-bit (x86) release mode in a Windows 7 64-bit box with an Intel Core
> i5-2520M
> @2.5GHz:
>
>   [](int x){return x;}
>   segment size: 32
>   n       plain   segmented
>   10E3    25.5472 23.6305
>   10E4    24.5778 23.6907
>   10E5    24.5821 22.8076
>   10E6    25.5007 23.1037
>   10E7    27.1452 24.0339
>   segment size: 512
>   n       plain   segmented
>   10E3    23.8384 23.6638
>   10E4    23.0284 23.8705
>   10E5    22.8449 22.8187
>   10E6    23.8485 23.7454
>   10E7    24.1711 23.5404
>
>   [](int x){return x%4?x:-x;}
>   segment size: 32
>   n       plain   segmented
>   10E3    33.9795 23.6662
>   10E4    32.4817 24.023
>   10E5    32.8731 23.3803
>   10E6    33.5396 22.9298
>   10E7    33.1034 23.0206
>   segment size: 512
>   n       plain   segmented
>   10E3    25.0623 23.3205
>   10E4    25.1048 23.5812
>   10E5    25.3343 21.7686
>   10E6    25.6961 22.4639
>   10E7    25.8664 22.9964

For 32-bit release mode on windows 7 64 bit with intel i7-2700K:

[](int x){return x;}
segment size: 32
n plain segmented
10E3 21.4589 21.8351
10E4 19.9545 20.5133
10E5 19.4889 20.6197
10E6 19.2552 19.6976
10E7 19.2919 19.5425
segment size: 512
n plain segmented
10E3 20.2503 20.6372
10E4 19.0234 19.3367
10E5 18.5394 18.6171
10E6 18.555 18.5816
10E7 19.0918 19.1833

[](int x){return x%4?x:-x;}
segment size: 32
n plain segmented
10E3 28.743 19.7501
10E4 26.8371 19.0719
10E5 27.0304 18.7624
10E6 26.9561 18.2357
10E7 27.2985 18.6425
segment size: 512
n plain segmented
10E3 22.1073 20.0347
10E4 20.7825 19.5639
10E5 20.6122 18.0773
10E6 20.6039 18.4895
10E7 21.7964 19.1822

So basically the same as your results. The case for segment size 32 and
a non-trivial lambda does show some speedup, doesn't it?

For 64-bit release mode on windows 7 64 bit with intel i7-2700K:

[](int x){return x;}
segment size: 32
n plain segmented
10E3 34.748 21.1357
10E4 32.8879 19.8592
10E5 32.6779 18.955
10E6 32.6255 19.3307
10E7 33.2282 19.3158
segment size: 512
n plain segmented
10E3 28.442 20.0265
10E4 26.5783 18.5851
10E5 26.4857 18.6023
10E6 26.4884 18.6571
10E7 27.0076 19.1338

[](int x){return x%4?x:-x;}
segment size: 32
n plain segmented
10E3 43.0149 18.8431
10E4 42.2736 18.5071
10E5 42.4035 18.7087
10E6 42.1964 18.3355
10E7 42.8113 18.7723
segment size: 512
n plain segmented
10E3 40.3695 19.0028
10E4 38.5371 18.2029
10E5 38.2163 17.85
10E6 38.2952 17.9199
10E7 38.7489 18.6342

I don't know why a 64-bit program would be slower, but there seems to be
a larger difference here.

I'm wondering how the results would be on 32/64 bit ARM.

Also, I do expect a benchmark of serialization to be much better. I
don't think one do that optimally without access to the segments.

Benedek, could you please make a test of the performance of
serialization for both devector/batch_deque vs
boost::vector/boost::deque (release mode, full speed optimization),
perhaps using the same measuring technique as employed by Joaquin. And
then post results and code so people can run it on their favorite
system. You should use types char, int, and something bigger, e.g.
string or array<int,32>.

kind regards

Thorsten


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk