Boost logo

Boost :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2024-12-09 20:38:31


On Sat, Dec 7, 2024 at 7:47 AM Matt Borland via Boost
<boost_at_[hidden]> wrote:
> The review of Hash2 by Peter Dimov and Christian Mazakas begins today Saturday, December 7th through Sunday December 15th, 2024.
> Please provide feedback on the following general topics:
>
> - What is your evaluation of the design?

Overall, I like it. First, some general notes.

I know that the authors want robust C++11 support, but I think the
library suffers from having a C++11-only implementation. I think if
you want to support C++11, then by all means do that. But the rest of
use would like interfaces that use: std::byte instead of sequences of
unsigned char and void *; std::span instead of a pointer and size;
language concepts instead of documentation-as-concepts, etc. If you
macro guard your C++11 interfaces, you can have your cake and eat it
too. I for one have not used C++11 since sometime in 2014. While 11
users exist, they are a small minority now. Requiring them to define
a macro to get 11-compatible interfaces does not seem onerous to me,
and the rest of us should be able to use more modern stuff.

One example is digest<>. The docs say, "digest<N> is a constexpr-friendly
class template similar to std::array<unsigned char, N>. It is used to store
the resulting message digest of hash algorithms such as SHA2-256 or
RIPEMD-160.". So, exactly like C++17 std::array then? :) I'd prefer that
digest<> be an alias for std::array<> in later C++ build modes.

In short, we don't need more backward-compat types that replace vocabulary
types -- the backward-compat type should only exist as a polyfill, and the
normal vocabulary type should be used wherever possible.

Now, more specific notes:

Why the discrepancy between unsigned char const * in the hash algorithm ctor,
and void const * in update()?

Why define your own endian? You already have an external dependency it seems,
on boost::container_hash, so why not use the logic and constants from
Boost.Endian?

I'm sympathetic to the argument raised on the list earlier that, since
Flavor is known statically and should remain fixed across calls to
hash_append(), it would be nice if it could be captured somehow, so I
don't have to repeat it over and over, possibly fat-fingering it and
introducing a bug. Once possibility would be to make a struct
hash_appender (or whatever name), and give it a member function:

template<class Flavor>
struct hash_appender {
  template<class Hash, class T>
  constexpr void operator()(Hash& h, T const& v) {
    hash2::hash_append(h, Flavor{}, v);
  }
};

Then the user could have the option of writing:

auto hash_append = boost::hash2::hash_appender<Flav>{};
hash_append(h, x);
hash_append(h, y);
hash_append(h, z);

To me, that's much nicer as a user, and having both interfaces would be a
benefit. If I needed to call hash_append() over and over, I'd use the binary
API; for a single call, I'd probably just use the ternary one.

You might need an appender for each has_append operation, like
hash_append_unordered_range() or whatever. That brings me to my next point.

Why are there so many overloads of has_append*()? Here they all are:

template<class Hash, class Flavor, class T>
constexpr void hash_append( Hash& h, Flavor const& f, T const& v );

template<class Hash, class Flavor, class It>
constexpr void hash_append_range( Hash& h, Flavor const& f, It first, It last );

template<class Hash, class Flavor, class T>
constexpr void hash_append_size( Hash& h, Flavor const& f, T const& v );

template<class Hash, class Flavor, class It>
constexpr void hash_append_sized_range( Hash& h, Flavor const& f, It
first, It last );

template<class Hash, class Flavor, class It>
constexpr void hash_append_unordered_range( Hash& h, Flavor const& f,
It first, It last );

Naively, I would have reduced them to:

struct unordered_tag {};

template<class Hash, class Flavor, class T>
constexpr void hash_append( Hash& h, Flavor const& f, T const& v );

template<class Hash, class Flavor, class It>
constexpr void hash_append( Hash& h, Flavor const& f, It first, It last );

template<class Hash, class Flavor, class It>
constexpr void hash_append( Hash& h, Flavor const& f, It first, It
last, unordered_tag );

Is there a reason that could not work? In particular, It's not clear to me
why a value used as a size must be cast to Flavor::size_type. What happens
when I call hash_append() (that is, I forget to say hash_append_size())? As a
data point, detail::do_hash_append() uses:

hash2::hash_append( h, f, '\x00' );

... for empty fixed-size types like std::arrays, but the overload for
variable-sized containers like std::vector uses:

hash2::hash_append_sized_range( h, f, v.begin(), v.end() );

... which in turn calls:

hash2::hash_append_size( h, f, last - first );

Also, even if you don't change any of the names above, please don't use the
phrase "sized_range", since we now have a std::ranges concept with that name
that means something substantially different (std::ranges::sized_range
requires std::ranges::size(x), for instance, where yours does not).

Speaking of which, it would be nice to have direct support for
std::range::sized_ranges and std::contiguous_iterators in C++20 mode. Right
now, you only detect a small, fixed subset of all possible sized ranges, and
treat T* as the only possible contiguous iterator. These could be supported
within the existing overloads, with "if constexpr
(std::ranges::sized_range<T>)", etc.

> - What is your evaluation of the implementation?

Looks good -- easy to read; no red flags. There are lots of low-level
shenanigans, and they are done without UB, as far as I can tell.

> - What is your evaluation of the documentation?

I think the intro could really benefit from a small snippet showing a simple
use of the library. In particular, I cannot tell at the beginning whether I
always need to write something like HashAlgorithm or not. I'm sure it will
become clear later, but earlier is better; providing simple use like hashing a
string using a reasonable choice of algorithm, for use in a
std::unordered_map<std::string, std::string> would be great. It would give me
the context for the rest of the docs.

Speaking of which, the docs go straight from a minimal intro into how to write
my own hash algorithm. I think most users will want to use one of the
provided ones, and has_append(). I think that should come first, or maybe you
could just have a link to a quick-start/cheat sheet-type section that the user
could jump to right away.

In the section Contiguously Hashable Types, I think you should define
"contiguously hashable" when introducing it, or at least link to a definition.
I think I can guess what it means from context, but I'm not certain. As it is
now, being certain would mean going to the implementation of
is_contiguously_hashable.

In the section Ranges, please provide more details on how you determine the
order in which to add elements from an unordered range. What you have now is
(I think intentionally) vague, but I'm concerned about performance -- I expect
hashing operations to be pretty damn fast, and the currently-thin description
makes me want to go start reading code. Looking at the code, it's a linear time
algorithm, and that's the main thing I was worried about -- maybe just mention
that.

In the section about adding tag_invoke overloads, s/the member c is not part
of the object state/the member c is not a salient part of the object state, to
match the comment in the code just above.

For the reasons I mentioned before about typical users, I think the example
about how to use the library with unordered containers should be the first
example. The code for that example is very nice, and I think if you just add
a comment like this to the line that calls hash_append():

    std::size_t operator()( T const& v ) const
    {
        H h;
        boost::hash2::hash_append( h, {}, v ); // hash values of any
type here, including user-defined types
        return boost::hash2::get_integral_result<std::size_t>( h.result() );
    }

... that would be the 90% use case code example the intro is missing.

> - What is your evaluation of the potential usefulness of the library? Do you already use it in industry?

I think it will be very useful. I look forward to not having to write
hash_append over and over again in different projects. :) I do not use
the library currently.

> - Did you try to use the library? With which compiler(s)? Did you have any problems?

I did not try to use it.

> - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I read the documentation, and read the implementation (though not the
hash algorithms themselves). I spent about 4 hours doing this review.

> - Are you knowledgeable about the problem domain?

As a user of hashing containers, yes. As for anything cryptographic,
not at all.

> Ensure to explicitly include with your review: ACCEPT, REJECT, or CONDITIONAL ACCEPT (with acceptance conditions).

I vote to ACCEPT this library.

Zach

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk