Boost logo

Boost :

Subject: Re: [boost] [rfc] Preliminary Hash Library
From: Stewart, Robert (Robert.Stewart_at_[hidden])
Date: 2010-05-06 07:54:25


Scott McMurray wrote:
>
> I've boostified, cleaned up, and documented the Hash Library I
> mentioned here about a month ago. Comments, complaints, suggestions,
> and questions appreciated.
>
> Source code is in the sandbox at
> http://svn.boost.org/svn/boost/sandbox/hash/
>
> Documentation can be read online at
> http://www.cs.mcgill.ca/~smcmur/hash/

I won't point out the various typos I noticed during my reading, but will focus on bigger issues at this time.

Links from one section to the next (and the reverse) would be helpful.

The first page of the documentation refers to "message digests" without defining the term. That's off-putting. Furthermore, the association between "hash," which is in the library name, and "message digest" is weak. Not everyone considering your library knows about cryptographic hashes and the terminology relating to them.

__________
Objectives

This section is, apparently, a Motivation section, but doesn't serve that role well. You need to focus on the person just considering your library. Why should they use it? What makes it better, easier, faster than something else?

Listing reasons for having created the library should be part of a Rationale section, or similar.

Thus, you might include something more like the following:

"Computing hashes (message digests?) is useful for verifying the integrity of data. The creator of the data can compute the hash and the user of the data can, upon receipt of the data, compute a new hash and compare it with the creator's hash, to ensure the data received is the data created originally.

"Hashes can be computed using various algorithms of differing resistance to compromise from random data permutations and malicious alterations. The increased resistance usually comes at a runtime cost of computation or of hash size. A common alternative for this sort of checking is a cyclic redundancy check (CRC), such as provided by Boost.CRC. The hashes in MaybeBoost.Hash, however, are stronger and, thus, more secure.

"The Boost.UUID library needed a random number generator based upon a strong hashing function to ensure ID uniqueness. While Boost.UUID provided one hash that served well, it didn't include support for an important alternative. MaybeBoost.Hash provides the original hash, the one Boost.UUID was missing, and more, all in a reusable and extensible framework.

"MaybeBoost.Hash uses generic programming to reuse common algorithms with hash-specific policies to account for differences in input sizes, hashing algorithms, etc. The library provides flexible access to the hashing process and provides convenient output options."

___________
Quickstart

In the bullet 2 table, the SHA-256 policy should be sha2<256>, right?

Why include CubeHash when it is merely a candidate and you recommend not using it? Including some of the others is useful merely because there are existing demands for their use. Is the same true of CubeHash?

___________
Functions

To what bits does, "All the bits in the provided values are used," refer? Did you mean something like, "All bits in the input range values are used?"

Bullet 1: "As a pair on input iterators"

Bullet 2: "As an input iterator and a length"

Do you support Boost.Range/RangeEx?

I don't understand to what the Portability Note applies. Where is the uint_t<N>::exact to be used? Do you mean the data type in the input range?

The Alternative may make more sense when you clarify the "all bits" part I raised above. However, if I'm right, then this text could be clearer as, "If you need to use a subset of bits from the input range's values, then use StreamHash. The equivalent, albeit verbose, expression would be HashPolicy::stream_hash<N>::type().update(b, e).end_message()." Also, this should be paired with the previous "all bits" sentence by making it a footnote.

__________
Concepts

HashPolicy Concept

Is "HashPolicy" a good name? Is "Hash" not enough? That is, the Hash Concept? You don't have "MessageDigestPolicy" or "StreamHashPolicy," so why "HashPolicy?"

T::stream_hash<N>::type
   - A model of the StreamHash concept (link to that concept section)
   - ...

T::digest_type
   - A model of the MessageDigest concept (link to that concept section)
   - The same type as T::stream_hash<N>::type::digest_type for all allowed values of N

StreamHash Concept

Notice that I removed the "<ValueBits>" part. I don't think that should be part of the concept name.

T::digest_type
   - A model of the MessageDigest concept (link to that concept section)

Rather than spelling out the operations that must be supported, why not list the concepts that must be supported, when reasonable: DefaultConstructible, CopyConstructible, CopyAssignable?

What does, "Provided models are accessible through HashPolicy models," mean?

_____
Rob Stewart robert.stewart_at_[hidden]
Software Engineer, Core Software using std::disclaimer;
Susquehanna International Group, LLP http://www.sig.com

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk