Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30
From: Thorsten Ottosen (tottosen_at_[hidden])
Date: 2017-09-27 16:23:00

Next message: John Maddock: "[boost] [Pool] no maintainer, and should perhaps be deprecated?"
Previous message: Joaquin M LÃ³pez MuÃ±oz: "Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30"
In reply to: Joaquin M LÃ³pez MuÃ±oz: "Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30"
Next in thread: Joaquin M LÃ³pez MuÃ±oz: "Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30"
Reply: Joaquin M LÃ³pez MuÃ±oz: "Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30"
Reply: Thorsten Ottosen: "Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30"
Reply: Benedek Thaler: "Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30"

Den 26-09-2017 kl. 23:43 skrev Joaquin M LÃ³pez MuÃ±oz via Boost:

Thanks for the thorough review.

> 3. The question arises of whether segment access can gain us some speed.
> I've
> written a small test to measure the performance of a plain std::for_each
> loop
> over a batch_deque vs. an equivalent sequence of segment-level loops
> (attached, batch_deque_for_each.cpp), and this is what I got for Visual
> C++ 2015
> 32-bit (x86) release mode in a Windows 7 64-bit box with an Intel Core
> i5-2520M
> @2.5GHz:
>
> Â [](int x){return x;}
> Â segment size: 32
> Â nÂ Â Â Â Â Â plainÂ Â segmented
> Â 10E3Â Â Â 25.5472 23.6305
> Â 10E4Â Â Â 24.5778 23.6907
> Â 10E5Â Â Â 24.5821 22.8076
> Â 10E6Â Â Â 25.5007 23.1037
> Â 10E7Â Â Â 27.1452 24.0339
> Â segment size: 512
> Â nÂ Â Â Â Â Â plainÂ Â segmented
> Â 10E3Â Â Â 23.8384 23.6638
> Â 10E4Â Â Â 23.0284 23.8705
> Â 10E5Â Â Â 22.8449 22.8187
> Â 10E6Â Â Â 23.8485 23.7454
> Â 10E7Â Â Â 24.1711 23.5404
>
> Â [](int x){return x%4?x:-x;}
> Â segment size: 32
> Â nÂ Â Â Â Â Â plainÂ Â segmented
> Â 10E3Â Â Â 33.9795 23.6662
> Â 10E4Â Â Â 32.4817 24.023
> Â 10E5Â Â Â 32.8731 23.3803
> Â 10E6Â Â Â 33.5396 22.9298
> Â 10E7Â Â Â 33.1034 23.0206
> Â segment size: 512
> Â nÂ Â Â Â Â Â plainÂ Â segmented
> Â 10E3Â Â Â 25.0623 23.3205
> Â 10E4Â Â Â 25.1048 23.5812
> Â 10E5Â Â Â 25.3343 21.7686
> Â 10E6Â Â Â 25.6961 22.4639
> Â 10E7Â Â Â 25.8664 22.9964

For 32-bit release mode on windows 7 64 bit with intel i7-2700K:

[](int x){return x;}
segment size: 32
n plain segmented
10E3 21.4589 21.8351
10E4 19.9545 20.5133
10E5 19.4889 20.6197
10E6 19.2552 19.6976
10E7 19.2919 19.5425
segment size: 512
n plain segmented
10E3 20.2503 20.6372
10E4 19.0234 19.3367
10E5 18.5394 18.6171
10E6 18.555 18.5816
10E7 19.0918 19.1833

[](int x){return x%4?x:-x;}
segment size: 32
n plain segmented
10E3 28.743 19.7501
10E4 26.8371 19.0719
10E5 27.0304 18.7624
10E6 26.9561 18.2357
10E7 27.2985 18.6425
segment size: 512
n plain segmented
10E3 22.1073 20.0347
10E4 20.7825 19.5639
10E5 20.6122 18.0773
10E6 20.6039 18.4895
10E7 21.7964 19.1822

So basically the same as your results. The case for segment size 32 and
a non-trivial lambda does show some speedup, doesn't it?

For 64-bit release mode on windows 7 64 bit with intel i7-2700K:

[](int x){return x;}
segment size: 32
n plain segmented
10E3 34.748 21.1357
10E4 32.8879 19.8592
10E5 32.6779 18.955
10E6 32.6255 19.3307
10E7 33.2282 19.3158
segment size: 512
n plain segmented
10E3 28.442 20.0265
10E4 26.5783 18.5851
10E5 26.4857 18.6023
10E6 26.4884 18.6571
10E7 27.0076 19.1338

[](int x){return x%4?x:-x;}
segment size: 32
n plain segmented
10E3 43.0149 18.8431
10E4 42.2736 18.5071
10E5 42.4035 18.7087
10E6 42.1964 18.3355
10E7 42.8113 18.7723
segment size: 512
n plain segmented
10E3 40.3695 19.0028
10E4 38.5371 18.2029
10E5 38.2163 17.85
10E6 38.2952 17.9199
10E7 38.7489 18.6342

I don't know why a 64-bit program would be slower, but there seems to be
a larger difference here.

I'm wondering how the results would be on 32/64 bit ARM.

Also, I do expect a benchmark of serialization to be much better. I
don't think one do that optimally without access to the segments.

Benedek, could you please make a test of the performance of
serialization for both devector/batch_deque vs
boost::vector/boost::deque (release mode, full speed optimization),
perhaps using the same measuring technique as employed by Joaquin. And
then post results and code so people can run it on their favorite
system. You should use types char, int, and something bigger, e.g.
string or array<int,32>.

kind regards

Thorsten

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk