Subject: [boost] [Endian] library proposal
From: David Stone (david_at_[hidden])
Date: 2012-04-03 11:36:12
A while back, I had the somewhat common problem of networking in C++. Libraries
like Boost.Bind and Boost.Asio proved immensely helpful, but there wasn't
anything in Boost that would take care of the byte ordering. Fortunately, I
could use the hton family of functions (htonl, htons, ntohl, ntohs) because all
of my networking code was 4 bytes or smaller.
Later on, I was writing some code that created hashes (specifically MD5 and
SHA-2). In the quest for more performance, I tried to maximize the amount of
memory I accessed at once (grabbing 4, 8, or even sizeof (uintmax_t) bytes at a
time, instead of 1). This also meant that I had to make sure I manipulated those
bytes in the proper order, so I went to my networking code and tried to reuse
that. However, I quickly ran into some problems, the largest of which is that
there is no htonll or similar for 8-byte integers. So I did what any self-
respecting programmer would do: I wrote my own using bit shifting.
My work paid off, and I was rewarded with a few % increase in performance in my
library. I naturally wondered if I could do better, and I found out that yes, I
could do better than dynamic endian determination and manual bit shifting.
This led me to make a complete library, with the goal of submitting it to Boost.
Imagine my surprise to find out that just a few months ago, Beman Dawes
submitted his own version of such a library!
However, I feel my version offers a few improvements over parts of Dawes's
library. My library does not provide any extra integer types. All mine does is
convert the representation. For this reason, I feel it would probably be best
for the final Boost.Endian library to combine my library with Dawes's. Here is
an overview of the features of my library, with emphasis given on the relative
strengths of what I have written (in no particular order)
* My library uses compiler intrinsics where they exist. Currently, it uses
intrinsics with GCC, Clang, MSVC, and (according to the documentation, I haven't
tested) ICC. On GCC, where I have done most of my testing, using intrinsics gave
a small (but measurable) performance boost. I believe my timings suggested about
a 3.5% increase in speed using the intrinsic version vs. using a manual bit-
shifting version with compiler options:
-Ofast -march=native -funsafe-loop-optimizations
which I believe is close to the strongest set of optimizations gcc supports.
* My library uses static determination of endianness where possible (as defined
in boost/detail/endian.hpp), and dynamic determination of endianness only when
that fails. Static determination gave me about a 1.25% increase in speed over
dynamic determination (and has slightly smaller memory requirements, but they
are probably on the order of one word or two).
* My library uses templates to determine which byte reordering function to call.
This genericity was essential to support my SHA-2 solution, for instance,
because the SHA-2 hashing algorithm is actually a family of 4 functions, half of
which use a 32-bit word size, and half of which use a 64-bit word size. I had
already minimized code duplication by use of templates, and a template-based
solution works much better with generic code (while not inconveniencing non-
template code) than functions like htonl / htons.
* My library returns results via return value, rather than by reference. This
is more natural and likely leads to greater performance (due to not having to
reference / dereference built-in types). It also allows const-correct
* My library includes thorough unit testing that doubles as a performance test.
Compiling the .cpp file and calling it normally is a simple unit test that fully
exercises its capabilities (it calls multiple versions of conversion functions
and verifies dynamic byte order determination is working as expected). It also
accepts an optional command-line argument as an integer that states how many
iterations of the unit tests to run. The program then reports how long each test
* Like Dawes's library, my library does not (yet) include support for floating
point numbers. The reason for this is that floating point endianness is not well
documented or as consistent. It's possible for there to be little endian
floating point numbers and big endian integers on the same machine. Rather than
suggest that the problem is solved by adding in support for floating point
numbers, attempting to use them will result in a compile-time error. However,
thanks to my library's use of type traits, changing around most of the functions
is as simple as changing is_integral with is_numeric. I'm not even sure if
traditional endianness can describe floating point numbers as a whole, or if
(for example) the mantissa and the exponent are each stored in a particular
(hopefully identical) endianness. This could lead to numbers with byte ordering
like DCBA HGEF (where the first part represents the mantissa and the second the
exponent, or perhaps it's the other way around). Basic byte-swapping functions
will not cope with this, and as such, I'll need to do more research.
* My library supports big-endian, little-endian, and PDP-endian.
* I have tested the library with gcc 4.6.2 on Fedora 16 64-bit and Fedora 16
32-bit. I have tested with clang 3.0 on Fedora 16 64-bit. I have tested with
MSVC 9 and 11 (I believe those were the versions I used) on Windows 7 64-bit. My
code compiles with no warnings in GCC, even when set to the paranoid warning
levels described in this post: http://stackoverflow.com/a/9862800 . It nearly
compiles cleanly in Visual Studio at /W4, except that it complains about a
conditional expression being constant in a template (it's always true for some
instantiations, and always false for others). Even with /Wall, it compiled
cleanly (other than library code), with the exception of the above mentioned
warning about constant conditional expressions, and a few confusing signed /
unsigned mismatch warnings. Unfortunately, I do not have a big-endian or pdp-
endian machine to test on. However, my MD5 and SHA-2 tests still validate (which
they would fail to do if my big-endian routines were wrong).
* I'm a little uncertain about my dynamic byte ordering. I currently determine
it once and save the result to a global const bool. I'm not sure if that's the
best way (but it did give a few % increase in performance over computing it
For these reasons, I hope you consider my library for inclusion in Boost.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk