|
Boost : |
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2024-12-08 01:03:44
Hi,
This is my review of this library
-----------------------------------
- What is your evaluation of the design?
-----------------------------------
I think the design is good, I like the clear separation between the
algorithm and user-defined types, and I was sympathetic with the
original "Types donât know #" proposal.
I think the included hash algorithms are varied and practical.
Some minor comments (opinions, don't take them as "bad design decisions"):
- "HashAlgorithm::block_size" is an int, IMHO it should std::size_t. A
negative value has no sense (and the documentation does not say anything
if an algorithm defines it as a negative number, I suppose it's UB/bug).
Maybe we should document some valid range that the library supports.
- No move constructor/assigment for algorithms. I don't know if those
make any sense, but I imagine user implementations with those members
are perfectly valid.
- The semantics for copy constructor/assignment might not be clear. The
library internally uses the copy constructor (e.g. when implementing the
unordered range hash), so it certainly expects something more concrete
than "equivalent to the original". I imagine a copy/assigned algorithm
is required to lead to the same output when updated with the same data.
Maybe this should be explained in the documentation with an example.
- "result" sounds like a getter/const function returning the already
calculated value, but it's a required "final" step that might even
return different values if repeatedly called. Naming it to something
like "produce_result" might be more self-explaining.
- Should the HashAlgorithm support calls to "update" and "result"
several times, interleaving both calls? Does the library call "result"
only as the final step once or more times? I think required guaranteed
should be more explicitly stated in the design/documentation.
- I don't know if SHA-1/RIPEMD-128 should be in the library at all
(insecure and not sure if useful in any real context).
- The original proposal mentions a "variadic version of hash_append",
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#variadic)
maybe that would be a useful utility in the library.
- The original proposal uses a different syntax:
void
operator()(void const* key, std::size_t len) noexcept
{
state_.Update(key, len);
}
explicit
operator result_type() noexcept
{
std::uint64_t h1, h2;
state_.Final(&h1, &h2);
return h1;
}
Which are the advantages of the new approach (update/result) vs the
original proposal?
- The hash_append_unordered_range function is surprising for people like
me that are not experts on hashing, but it's an important concept for
users. The original proposal explains the problem
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#unordered),
I feel the problem should be also explained in the documentation of
Hash2, and how the library design solves it (using a copy of the
HashAlgorithm, as proposed in n3980).
N3980 also says that "And there are various other schemes. They are all
implementable. But they each have their advantages and disadvantages.
Therefore this proposal proposes none of them. Should the future expose
the ideal hash_append specification for unordered containers, it can
always be added at that time."
Why does Hash2 offer a single alternative? Would it be a good a idea to
offer different unordered hashing functions (each with different
trade-offs) so that a user can choose the best one?
-N3980 proposes a "type-erase hasher"
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#type_erased_hasher),
would it be interesting to include it in Hash2?
- I didn't know about the "tag_invoke" customization point trend. I
imagine that this avoids polluting all namespaces with the chosen
function name.
However, AFAIK library proposals as "std::execution" have abandoned the
tag_invoke approach and will use member functions as customization
points
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2855r1.html).
Are member functions a better customization point for user-defined
classes than these tag_invoke friend functions that kick ADL (and its
associated potential problems)?.
Say:
//declare a member function in your type that will be
//called by boost::hash2::hash_append()
template<class Hash, class Flavor>
void hash_append
(boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f) const
{
boost::hash2::hash_append(h, f, this->a);
boost::hash2::hash_append(h, f, this->b);
}
Is tag_invoke superior because we can declare customization points for
classes that we can't modify? But we need access to data members so we
need priviledge access to that type...
- The documentation shows (in the chapter "Use with Unordered
Containers") several "hash" classes that are needed to be able to use
Hash2 with std/boost unordered_xx classes (or
boost::intrusive::unordered_xxx types). I think it would be a good idea
to include such utilities in this Hash2 library (under names like
"hasher", "seeded_hasher".
- The design says "The size_type member type of the flavor affects how
container and range sizes (typically of type size_t) are serialized.
Since the size of size_t in bytes can vary, serializing the type
directly results in different hash values when the code is compiled for
64 bit or for 32 bit. Using a fixed width type avoids this".
I don't agree with this, between 32 and 64 bit compilation you will have
differences when hashing user-defined types that have pointers, stored
std::size_t/ptrdiff_t members, etc.. IMHO xxx_flavours defined in the
library should define size_type as "std::size_t". Those wanting portable
hashes between 32 and 64 bit need to do a bigger effort than configuring
Hash2, and probably defining size_type as std::uint64_t will reduce the
performance of 32 bit users...
-----------------------------------
- What is your evaluation of the implementation?
-----------------------------------
Very good IMHO.
- I like it supports C++11.
- Code is well-written and quite simple.
- Extensive test suite.
Some comments:
- One thing I don't like is that the library depends on
container_hash/describe and mp11. I would expect this library to be
dependency-free (other than Config/Assert). Specially I dislike the
"Describe" dependency. Is there a way to support Described classes
without depending on that library? (just like container-like classes are
supported without requiring an additional dependency). Dependency on
container_hash seems is probably in order to avoid duplicating code.
Note tha container_hash also depends on Describe (which I also find
unnecessary)
- There is an asymmetry between "writeNNle" and "writeNNbe"
implementations (little vs big endian). In little endian implementations
"detail::is_constant_evaluated() and endianess" is used to be able to
use memcpy in some cases
(https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail/write.hpp#L26)
where in the big endian implementation this is not performed
(https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail/write.hpp#L73).
Probably few people use big endian platforms (except those like me that
still need to compile for powerpc targets...), but I would expect an
equivalent implementation for big-endian functions.
-----------------------------------
- What is your evaluation of the documentation?
-----------------------------------
Good quality documentation. I'm not a fan of single page documentation,
though.
Some examples are not compilable in C++11:
- class X example is missing #include <string>.
- class Str example: N is missing the type, missing <algorithm> include,
operator== has the wrong type...
- All "Use with Unordered Containers" examples should be compilable,
including the needed headers.
- I think some benchmarks results are missing to show the quality of the
implementation (there is a benchmark folder in the library, but no
results are shown). I don't expect Hash2 implementations to be the
fasted on the planet, but certainly decent hash speeds are important for
users. Maybe we should have numbers comparing implemented hashes with
std and boost container_hash implementations? Against some other
reference implementations? The original N3980 proposal
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#testing)
has some numbers also on the quality of the hash functions.
-----------------------------------
- Did you try to use the library? With which compiler(s)? Did you have
any problems?
-----------------------------------
Yes. I've compiled and run under the debugger the documentation examples.
I've run the test-suite run successfully:
Windows: toolset=msvc-14.3,msvc-14.2,msvc-14.1 variant=debug,release
address-model=32,64
Linux (Ubuntu 24): toolset=gcc-10,gcc-11,gcc-12,gcc-13,gcc-14,
clang-14,clang-15,clang-16,clang-17,clang-18 variant=debug,release
No problems were detected.
-----------------------------------
- How much effort did you put into your evaluation?
-----------------------------------
A couple of hours, reading the documentation, reading the source code,
passing the boostdep tool to the library, and debugging examples.
-----------------------------------
- Are you knowledgeable about the problem domain?
-----------------------------------
Not an expert on hash algorithms, I've reviewed the library from the
point of view of a user that wants to be able to use different hash
algorithms without modifying the implementation of the user-defined types.
I've implemented hash-based containers (but using the separate chaining
strategy)
-----------------------------------
Ensure to explicitly include with your review: ACCEPT, REJECT, or
CONDITIONAL ACCEPT (with acceptance conditions).
-----------------------------------
I recommend to ACCEPT the library, I don't think my suggestions/things I
dislike should not condition the acceptance.
Best,
Ion
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk