Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2024-12-07 01:29:11


On Fri, Dec 6, 2024 at 4:24 PM Ivan Matek via Boost <boost_at_[hidden]>
wrote:

> Goal was to "encourage" users in 2 ways to call result() only once. First
> way is that move signals that value is "tainted", second is that clang-tidy
> can detect double moves sometimes.
>

I propose to change HashAlgorithm requirements as follows:

---
HashAlgorithm result() is renamed to
    result_type finalize();
The current documentation for finalize [1] is moved elsewhere and replaced
with the following text:
    This function shall return the final hash value of the input message,
where the input message is defined by the ordered sequence of bytes
provided in all prior calls to update(). The behavior of subsequent calls
to finalize() is undefined unless specified by the HashAlgorithm.
---
Rationale:
The name "finalize" is closer to established practice and is more
suggestive of the typical state mutation:
https://github.com/openssl/openssl/blob/5fce85ec52a826d53665552b50e67f86c92dc394/include/openssl/sha.h#L76
The existing Hash2 documentation for result() is too specific and suggests
operations which may not be relevant such as for FNV-1a or other
byte-oriented algorithms. It would be better to state only the mandatory
requirements and leave the rest to the implementation.
The existing Hash2 documentation suggests the possibility of calling
finalize() twice or more in a row, yet this can only be considered safe,
secure, or otherwise generally following best practices on a case by case
basis depending on the HashAlgorithm.
The HashAlgorithm concept is first and foremost designed for use with the
Hash named requirement (std::hash and related). Making calls to a generic
finalize() undefined does not hinder this use-case.
There has been no research into whether HashAlgorithm should be held up as
the concept for how ALL hash algorithms should be modeled. Therefore the
named requirements for HashAlgorithm's finalize() function should only go
as far as needed to support the Hash use-case, and we should leave the rest
of its behavior up to the implementer.
In particular I think there is danger here:
    template< typename HashAlgorithm >
    auto double_finalize(  HashAlgorithm& h )
    {
        h.finalize();
        return h.finalize();
    }
This is dangerous because the usage in generic contexts results in an
unpredictable quality of result. We should use the more strict definition I
provided above for now, and only in the future loosen the definition if
there is evidence that doing so yields a net benefit. It is always easier
to go from strict to loose. And going from loose to strict after the fact
is difficult and often impossible without breaking things. It is better if
double_finalize is undefined. Users who want random numbers or whatever,
can do it with a specific implementation of HashAlgorithm which offers
additional guarantees for finalize().
[1] https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_bytes_result

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk