Boost logo

Boost :

Subject: [boost] [Endian] Review / Comments
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-09-08 06:33:02


hi,

documentation
the documentation in general is pretty short, but sufficient: the concepts of
the library are easy, so there is no need to go into more details.
however i would prefer if the documentation uses quickbook/boostbook. most of
the recent boost libraries have a very similar layout of the documentation,
but the boost.endian docs do not follow these conventions.

also the docs for `conversion functions' and for `integer types' are separate.
personally i'd prefer if they could go into the same document, maybe having a
structure like: introduction, integer types, conversion functions, examples,
reference (preferably doxygen-generated).

section `experience': this section gives no insights for people who use or
read the code. it mainly tells: "we are not the first and the domain of the
library is important.". imo this section can be removed (maybe the part that
it is not related to any c library can go to the `design considerations'

section "motivating use cases": this is more a marketing blurb/testimonial.
maybe this could be changed to a section about possible use cases, listing
`communicating between different devices' and `reading/writing of binary file
formats'.

provided files
i suppose, the msvc project file will go away when the library is merged into
trunk. besides, i'd suggest to move "boost/endian/support/timer.hpp" to
"lib/endian/support", because it only seems to be used by the benchmarks.

testsuite
scoped_enum_emulation_test.cpp ... not related to boost.endian, please remove

endian_operations_test.cpp and endian_in_union_test.cpp ... maybe rename from
_test.cpp to _compile_test.cpp? they don't seem to do any run-time tests. they
also should not include <cassert> since no assertion statement is needed, this
might speed up the compilation time of the testsuite by something like 50ms ;)

conversion functions
personally, i find the interface for the conversion functions rather odd, as
they don't actually return a value, but take a reference to the return value.
so they cannot really be used in a functional style.

if the reorder functions would have a signature like:
int16_t reorder(int16_t x);
it would be possible to use it as unary operator e.g. in with std::transform
and the like. this would also avoid the need for both in-place and out-of-
place functions.

iac, i don't see a reason for the in-place functions. for the reason, see
below.

performance considerations
the current implementation seems to perform horribly on x86_64 hardware. i ran
phil's nice benchmark program inside the `perf' profiler on my core i7 920
("g++-4.6 -O3 -std=gnu++0x -march=native"), where the algorithm takes MUCH
more time than htonl, gcc's builtins or a mask/shift implementation.

the main reason seems to be a huge amount of byte-wise load/stores, which lead
to a huge number of stalls. a mask/shift implementation will be computed
completely in registers, making use of instruction-level parallelism on out-
of-order cpus and requiring less memory transfers.

as a side effect, boost.endian will compose much better, when using the
reorder functions on values, which are already in registers, or which need to
be placed in registers later in the algorithm.

modification for bitmasks:
   return (src<<24)
        | ((src<<8) & 0x00ff0000)
        | ((src>>8) & 0x0000ff00)
        | (src>>24);

modification for boost.endian types (assuming little-endian hardware):
   uint32_t n;
   boost::detail::store_big_endian<uint32_t, sizeof(uint32_t)>(&n, src);
   return n;

USE_GCC_BUILTIN:
 Performance counter stats for './a.out':

        986.666145 task-clock # 1.000 CPUs utilized
                 0 context-switches # 0.000 M/sec
                 7 CPU-migrations # 0.000 M/sec
               307 page-faults # 0.000 M/sec
     2,637,245,698 cycles # 2.673 GHz
       633,384,188 stalled-cycles-frontend # 24.02% frontend cycles idle
       281,512,771 stalled-cycles-backend # 10.67% backend cycles idle
     6,007,671,727 instructions # 2.28 insns per cycle
                                             # 0.11 stalled cycles per
insn
     1,001,451,092 branches # 1014.985 M/sec
            31,112 branch-misses # 0.00% of all branches

       0.987133571 seconds time elapsed

USE_HTONL
 Performance counter stats for './a.out':

        938.226124 task-clock # 1.000 CPUs utilized
                 0 context-switches # 0.000 M/sec
                 3 CPU-migrations # 0.000 M/sec
               307 page-faults # 0.000 M/sec
     2,507,753,527 cycles # 2.673 GHz
       504,694,743 stalled-cycles-frontend # 20.13% frontend cycles idle
        53,006,181 stalled-cycles-backend # 2.11% backend cycles idle
     6,006,376,475 instructions # 2.40 insns per cycle
                                             # 0.08 stalled cycles per
insn
     1,001,180,324 branches # 1067.099 M/sec
            23,212 branch-misses # 0.00% of all branches

       0.938684989 seconds time elapsed

USE_BEMAN
 Performance counter stats for './a.out':

       7494.906291 task-clock # 1.000 CPUs utilized
                 0 context-switches # 0.000 M/sec
                12 CPU-migrations # 0.000 M/sec
               307 page-faults # 0.000 M/sec
    20,033,609,533 cycles # 2.673 GHz
    15,018,356,865 stalled-cycles-frontend # 74.97% frontend cycles idle
    11,372,397,445 stalled-cycles-backend # 56.77% backend cycles idle
    15,030,059,048 instructions # 0.75 insns per cycle
                                             # 1.00 stalled cycles per
insn
     1,005,600,935 branches # 134.171 M/sec
            88,848 branch-misses # 0.01% of all branches

       7.497572144 seconds time elapsed

USE_BITMASK
 Performance counter stats for './a.out':

        986.085196 task-clock # 0.999 CPUs utilized
                 0 context-switches # 0.000 M/sec
                 0 CPU-migrations # 0.000 M/sec
               307 page-faults # 0.000 M/sec
     2,635,658,193 cycles # 2.673 GHz
       631,672,783 stalled-cycles-frontend # 23.97% frontend cycles idle
       269,536,819 stalled-cycles-backend # 10.23% backend cycles idle
     6,008,101,435 instructions # 2.28 insns per cycle
                                             # 0.11 stalled cycles per
insn
     1,001,523,773 branches # 1015.656 M/sec
            31,020 branch-misses # 0.00% of all branches

       0.986601856 seconds time elapsed

USE_TYPES
 Performance counter stats for './a.out':

       7121.796026 task-clock # 1.000 CPUs utilized
                 0 context-switches # 0.000 M/sec
                12 CPU-migrations # 0.000 M/sec
               307 page-faults # 0.000 M/sec
    19,035,784,897 cycles # 2.673 GHz
    15,019,771,466 stalled-cycles-frontend # 78.90% frontend cycles idle
     1,182,185,735 stalled-cycles-backend # 6.21% backend cycles idle
    14,032,012,033 instructions # 0.74 insns per cycle
                                             # 1.07 stalled cycles per
insn
     1,005,916,931 branches # 141.245 M/sec
            84,751 branch-misses # 0.01% of all branches

       7.124340206 seconds time elapsed

What is your evaluation of the potential usefulness of the library?
very important, very useful

How much effort did you put into your evaluation? A glance? A quick reading?
In-depth study?

roughly 3 hours, reading code, documentation, running some benchmarks.
besides, i'm using an older version of the library for about 2 years without
having any problems.

Are you knowledgeable about the problem domain?
yes

Do you think the library should be accepted as a Boost library?
the library should be accepted, if

(a) the interface of the conversion functions is changed
(b) the performance can be improved
(c) the documentation integrates better with the rest of the boost
    documentation.

cheers, tim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk