Boost logo

Boost :

Subject: Re: [boost] Message Hashing Interface (SHA-1/256/384/512, MD4/5)
From: Daniel Trebbien (dtrebbien_at_[hidden])
Date: 2010-04-04 11:40:02


Hi Scott,

I, too, think that option 1 is best. Several other streaming message digest libraries use this pattern whereby the hash calculation can be updated with a given string of bytes as many times as required and a finalization routine produces the hash. Java's `MessageDigest` interface (http://java.sun.com/javase/6/docs/api/java/security/MessageDigest.html), for example, calls its "update" routines `update` and finalization routines `digest`.

Using option 1 would probably make the library more intuitive for those who are used to this scheme and you can still accommodate users who would like to compute hashes at several points along the stream by ensuring that the copy constructor clones the state.

Daniel

On Sun, 04 Apr 2010 10:09:09 -0600, Scott McMurray <me22.ca+boost_at_[hidden]> wrote:

> I wrote up some SHA message hashers recently, which came out rather
> nicely using templates to handle the commonalities between
> SHA-1/256/384/512.
>
> The one issue that came up is how to handle retrieving the hash from
> the accumulator. When working with just the raw SHA compressor it's
> not a problem, but when hashing messages, getting the hash of the
> message requires appending padding and a length. That means that SHA
> and MD[45] hashing cannot use the same simple interface of Boost.CRC
> that, in essence, consists of calling process_byte() and inspecting
> the cheap, const member function hash() as needed.
>
> Here are a couple of options:
>
> 1) The hash() function is non-const, and it appends the passing and
> length, calculates the hash, and resets the accumulator to be ready
> for the next message. This is perhaps the most efficient interface.
>
> 2) The hash() function is const, and copies the accumulator, then pads
> and length-appends the copied version, getting the hash from there
> instead. That gives perhaps the nicest interface, but the extra
> copying would be somewhat expensive. That said, it's overhead per
> message, not dependant on the length of the message, so it may be
> acceptable. Also, the hash function would still be quite expensive,
> contrary to usual expectations for const member functions.
>
> 3) The hash() function is just a cheap const that returns the current
> state of the compressor. This would require an additional
> end_of_message() function that would apply the padding and probably
> things up such that the next process_byte function would start a new
> message by resetting the compressor. That would allow the hash()
> function to be cheaply called as many times as desired between calling
> end_of_message() and providing the data for the next message. One big
> problem with this one is that hash() can be called after providing
> data but before calling end_of_message(), at which point is return
> value is essentially useless.
>
> I think I like (1) best, because it allows the user to copy it and get
> (2) if they'd rather, and doesn't have the potential call sequencing
> problem of (3).
>
> Any opinions, suggestions, alternatives, or comments?
>
> ~ Scott McMurray
>
> P.S. Yes, the point here is to create Boost.MD and Boost.SHA
> libraries. The first client would be Boost.UUID to allow it to offer
> the MD5 name-based UUIDs it currently cannot. It needs a hashing
> concept so that it can parametrise name_generator similarly to how it
> currently parametrises basic_random_generator [1].
>
> [1] http://www.boost.org/doc/libs/1_42_0/libs/uuid/uuid.html#Random%20Generator
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk