Boost logo

Boost :

Subject: [boost] Message Hashing Interface (SHA-1/256/384/512, MD4/5)
From: Scott McMurray (me22.ca+boost_at_[hidden])
Date: 2010-04-04 12:09:09


I wrote up some SHA message hashers recently, which came out rather
nicely using templates to handle the commonalities between
SHA-1/256/384/512.

The one issue that came up is how to handle retrieving the hash from
the accumulator. When working with just the raw SHA compressor it's
not a problem, but when hashing messages, getting the hash of the
message requires appending padding and a length. That means that SHA
and MD[45] hashing cannot use the same simple interface of Boost.CRC
that, in essence, consists of calling process_byte() and inspecting
the cheap, const member function hash() as needed.

Here are a couple of options:

1) The hash() function is non-const, and it appends the passing and
length, calculates the hash, and resets the accumulator to be ready
for the next message. This is perhaps the most efficient interface.

2) The hash() function is const, and copies the accumulator, then pads
and length-appends the copied version, getting the hash from there
instead. That gives perhaps the nicest interface, but the extra
copying would be somewhat expensive. That said, it's overhead per
message, not dependant on the length of the message, so it may be
acceptable. Also, the hash function would still be quite expensive,
contrary to usual expectations for const member functions.

3) The hash() function is just a cheap const that returns the current
state of the compressor. This would require an additional
end_of_message() function that would apply the padding and probably
things up such that the next process_byte function would start a new
message by resetting the compressor. That would allow the hash()
function to be cheaply called as many times as desired between calling
end_of_message() and providing the data for the next message. One big
problem with this one is that hash() can be called after providing
data but before calling end_of_message(), at which point is return
value is essentially useless.

I think I like (1) best, because it allows the user to copy it and get
(2) if they'd rather, and doesn't have the potential call sequencing
problem of (3).

Any opinions, suggestions, alternatives, or comments?

~ Scott McMurray

P.S. Yes, the point here is to create Boost.MD and Boost.SHA
libraries. The first client would be Boost.UUID to allow it to offer
the MD5 name-based UUIDs it currently cannot. It needs a hashing
concept so that it can parametrise name_generator similarly to how it
currently parametrises basic_random_generator [1].

[1] http://www.boost.org/doc/libs/1_42_0/libs/uuid/uuid.html#Random%20Generator


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk