Boost logo

Boost :

Subject: [boost] [endian] Use in Hash algorithms
From: Scott McMurray (me22.ca+boost_at_[hidden])
Date: 2010-05-28 11:39:06


Popular hash algorithms (MD5, SHA, ...) involve a preprocessing stage
to turn bytes (or individual bits) into (32- or 64-bit) words, using a
certain byte (and bit) order.

For my Hash library, I ended up writing a pack function that takes an
endianness and the number of bits in the input and output values, and
combines or splits the input as needed.

A simple example:

    {
    array<uint8_t, 8> in = {{0x67, 0x45, 0x23, 0x01, 0xEF, 0xCD, 0xAB, 0x89}};
    array<uint32_t, 2> out;
    pack<little_octet_big_bit, 8, 32>(in, out);
    array<uint32_t, 2> eout = {{0x01234567, 0x89ABCDEF}};
    assert(out == eout);
    }

As a bonus, it also handles non-bytes units:

    {
    array<uint8_t, 3> in = {{31, 17, 4}};
    array<uint16_t, 1> out;
    pack<big_bit, 5, 15>(in, out);
    array<uint16_t, 1> eout = {{(31 << 10) | (17 << 5) | (4 << 0)}};
    assert(out == eout);
    }

An extensive set of examples can be found here:
<http://svn.boost.org/svn/boost/sandbox/hash/libs/hash/test/pack.cpp>

This is used to turn the input into words, to turn the length into
words for padding, for figuring out where in the word the "1" padding
bit goes, and for turning the state back into octets for display.
There are enough optimizations SFINAEed in that the first example just
results in a memcpy on x86, but doesn't require contiguous input.
(It's perfectly happy with single-pass input, though usually somewhat
slower in that case.)

I'm not sure how widely applicable this form of the solution would be,
but I think it's a case where both the byte-swapping version and
Beman's swap-on-load approach are awkward.

~ Scott McMurray


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk