|
Boost : |
Subject: Re: [boost] [Endian] Performance
From: Adder (adder.thief_at_[hidden])
Date: 2011-09-05 23:05:41
Dear All,
I have humbly taken a look upon the "conversion.hpp" section
of Beman Dawes' library and I was able to learn a lot... Thank you...
I was able to learn a lot from Tymofey's and Phil Endecott's code too,
so thank you too !
I am just a programmer-wannabe... I do hope that my message
is not understood as disrespect, for it is not meant as such, not at all.
I have conducted a small and non-representative benchmark,
the ugly source code of which I have uploaded here:
http://preview.tinyurl.com/4yhcc8t
It compares several versions of code meant to "bswap"
16-bit, 32-bit and 64-bit integer values.
In the table below, I have used the following notations:
RC-1:
This is endian::revert from the release-candidate
(non-final, I think) version of Beman Dawes' library.
Tymofey:
A combination of my own unworthy hack for uint16_t,
Phil Endecott's version for uint32_t (for USE_TYMOFEY)
and Tymofey's suggestion for uint64_t.
Imbecil-0:
Boost-wannabe.RawMemory compiled with BOOST_RAW_MEMORY_TRICKS
#define'd as 0.
Imbecil-1:
Boost-wannabe.RawMemory compiled with BOOST_RAW_MEMORY_TRICKS
#define'd as 1 (the default).
Here are the approximative results,
on the same computer that is described here:
http://adder.iworks.ro/Boost/RawMemory/#Benchmarks
(Ctrl+F: "And the name of the computer is").
(Much to my shame, I have not employed Windows' "high-performance counter".
Also, I did not run too many repetitions in order to avoid overheating
the lovely computer or the benchmarks turning against me.)
(For the 64-bit integers -- what a devious choice ! --,
I have also noted the approximative number of machine code bytes
that were generated.)
Borland C++Builder 5.5:
uint16_t
RC-1 1765
Tymofey 265
Imbecil-0 250
Imbecil-1 250
uint32_t
RC-1 1922
Tymofey 453
Imbecil-0 469
Imbecil-1 453
uint64_t
RC-1 2360 ( 72 bytes of code)
Tymofey 3375 (225 bytes of code)
Imbecil-0 2234 (169 bytes of code)
Imbecil-1 797 (7 bytes for the caller, 15 bytes for the callee)
Digital Mars C++:
uint16_t
RC-1 360
Tymofey 250
Imbecil-0 265
Imbecil-1 250
uint32_t
RC-1 1828
Tymofey 391
Imbecil-0 1203
Imbecil-1 453 <-- I am such a noob !
uint64_t
RC-1 2797 ( 84 bytes of code)
Tymofey 2875 (292 bytes of code)
Imbecil-0 2453 (202 bytes of code)
Imbecil-1 609 (7 bytes for the caller, 15 bytes for the callee)
GCC 4.3.4 (20090804):
uint16_t
RC-1 1750
Tymofey 188
Imbecil-0 187
Imbecil-1 187 <-- Of course, I chose the run that favours me !
uint32_t
RC-1 1328
Tymofey 453
Imbecil-0 438
Imbecil-1 188
uint64_t
RC-1 2781 ( 67 bytes of code)
Tymofey 1578 ( 96 bytes of code)
Imbecil-0 578 ( 46 bytes of code)
Imbecil-1 250 ( 8 bytes of code)
Visual C++ 2003:
uint16_t
RC-1 984
Tymofey 187
Imbecil-0 125
Imbecil-1 203 <-- I have to investigate this !
uint32_t
RC-1 1563
Tymofey 375
Imbecil-0 515
Imbecil-1 125
uint64_t
RC-1 2328 ( 63 bytes of code)
Tymofey 3047 (154 bytes of code)
Imbecil-0 1110 (122 bytes of code)
Imbecil-1 438 ( 12 bytes of code,
but I had to "WorkAround.cpp"
an Internal Compiler Error
by avoiding the inlining the function;
I have to investigate this !)
Visual C++ 2005:
uint16_t
RC-1 906
Tymofey 188
Imbecil-0 187
Imbecil-1 203
uint32_t
RC-1 1906
Tymofey 391
Imbecil-0 328
Imbecil-1 125
uint64_t
RC-1 3437 ( 65 bytes of code)
Tymofey 3047 (151 bytes of code)
Imbecil-0 641 ( 61 bytes of code)
Imbecil-1 188 ( 8 bytes of code)
Visual C++ 2005 for x64:
uint16_t
RC-1 1687
Tymofey 203
Imbecil-0 188
Imbecil-1 203
uint32_t
RC-1 1390
Tymofey 328
Imbecil-0 329
Imbecil-1 188
uint64_t
RC-1 2406 ( 72 bytes of code)
Tymofey 703 (210 bytes of code; the loop was unrolled 1:2)
Imbecil-0 531 (162 bytes of code; the loop was unrolled 1:2)
Imbecil-1 187 ( 3 bytes of code)
The "Tymofey" optimizations are included in the previous emails
from Tymofey and Phil Endecott.
The "Imbecil" optimizations (and pessimizations) are described and
included here:
http://adder.iworks.ro/Boost/RawMemory
If Assembler, compiler intrinsics, __fastcall,
compiler-specific tuning, etc. sound interesting, you are welcome to have
a closer look.
Thank you for your time and for your work...
-- Yours truly, Adder On 9/6/11, Beman Dawes <bdawes_at_[hidden]> wrote: > On Mon, Sep 5, 2011 at 6:38 PM, Phil Endecott > <spam_from_boost_dev_at_[hidden]> wrote: >> I've just done some quick benchmarks of Beman's proposed byte-swapping >> code... >> >> What do people see on other platforms? > > Twenty plus years ago I put a lot of effort into finding optimum code > for a C language endian library, but real-world application tests > showed that what was optimum for one compiler was a dog on another > compiler, that compiler switches could change what was optimum code, > and then for the next release of the compiler we had to do it all over > again. > > That said, if we can come up with a benchmark representative of > real-world uses cases, and can come up with robust optimizations that > have some staying power, I'll gladly include them in the code. > > --Beman
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk