Boost logo

Boost :

From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2024-12-09 16:48:01


On 12/9/24 19:22, Matt Borland via Boost wrote:
>
>>> Does this mean you're not going to support algorithms that require no more
>>> than one call to finalize()/result()?
>>
>
>>
>
>> That's what the requirements say, yes. This is fully intentional and not incidental.
>>
>
>>
>
>
> I am copying discussion of this topic from Slack for greater visibility since I believe it is relevant to fundamental design of the library.
>
> Vinnie: For example, if a user implements a HashAlgorithm based on the code in rfc3174 (https://datatracker.ietf.org/doc/html/rfc3174#section-6) without making any changes, result will be idempotent and always return the same value regardless of the number of calls. And calls to update will generate an error
>
> Peter: sure, but that's not a correct implementation of the HashAlgorithm concept, so the user shouldn't do that. idempotent result is not a basis operation, because it's achievable by making a copy of the algorithm.
>
> Vinnie: Who are the authors of HashAlgorithm? I would argue they are ordinary users, not cryptographic experts, who are simply adapting an already-written algorithm to meet the named requirements. Expecting them to also become experts in hash algorithms is unreasonable and in my opinion beyond the scope of the library. expecting users to peer at someone else's implementation (e.g. rfc3174) and figure out whether or not it meets the HashAlgorithm requirements is unreasonable. no one is going to know anything about avalanches or even distribution or whatever. I am not suggesting that result should be idempotent. I am pointing out that the code in the RFC is idempotent, which breaks some principles of calling result twice. [snip extract from RFC3174 SHA1Result Implementaion] calling SHA1Result more than once just returns the same digest over and over. and calling SHA1Input after calling SHA1Result produces an error.
>
> Peter: Sure, but that's because the functions go out of their way to do that. If you remove the checks, you get what I require
>
> Vinnie: Users should not have to be experts in understanding the hash algorithm. You said yourself the library is not intended to be the repository for cryptographic algorithms. Obviously, users are going to want to adapt foreign code to the HashAlgorithm concept. Putting these additional requirements such as allowing update and result to be interleaved and called more than once is an unnecessary burden which forces authors of HashAlgorithm to become experts. I would ask, what is the motivating use-case for calling result twice? This is not explained in the docs and no examples are given. In fact, the one example given says "not to do this".
>
> Peter: on the other hand though, users of hash algorithms get useful functionality, which is otherwise withheld from them by accident or in some cases, even deliberately. the library can never provide assurances on the quality. it's entirely dependent on the hash algorithm. the quality of calling result once is also dependent on the hash algorithm
>
> Vinnie: That is true and by extension the library yields the same quality. Yet most hash algorithms have nothing to say about the quality of a second call to finalize. You are now forcing them to either say something, or for the user to guess
>
> Peter: yes, by imposing requirements on the hash algorithm I'm forcing them to say something. that's the point of imposing requirements.
>
> Matt
>
> RFC 3174: https://datatracker.ietf.org/doc/html/rfc3174#section-6

I was going to make it part of my review, but I'll post this now for
context. I have looked at several popular crypto libraries and none of
them support multiple calls to finalize and update:

- OpenSSL. See EVP_DigestFinal_ex description: "After calling
EVP_DigestFinal_ex() no additional calls to EVP_DigestUpdate() can be
made". (https://docs.openssl.org/3.1/man3/EVP_DigestInit/)
- gnutls. gnutls_hash_output description: "This function will output the
current hash value and reset the state of the hash.", note the last
part. (https://gnutls.org/manual/gnutls.html#gnutls_005fhash_005foutput)
- libsodium. "After crypto_generichash_final() returns, the state should
not be used any more" (https://doc.libsodium.org/hashing/generic_hashing)
- CryptoPP. The implementation of the IteratedHashBase<>::TruncatedFinal
function, which is used in the block hash algorithms, resets the state
at the end:
https://github.com/weidai11/cryptopp/blob/60f81a77e0c9a0e7ffc1ca1bc438ddfa2e43b78e/iterhash.cpp#L189
- Botan. See the HashFunction::final description: "After you call final,
the algorithm is reset to its initial state"
(https://botan.randombit.net/handbook/api_ref/hash.html#_CPPv4N12HashFunction5finalEP7uint8_t)
- gcrypt. See the gcry_md_final description: "After this has been done
no further updates (by means of gcry_md_write or gcry_md_putc) should be
done; [...] Only the first call to this function has an effect."
(https://gnupg.org/documentation/manuals/gcrypt/Working-with-hash-algorithms.html#Working-with-hash-algorithms)

That's pretty much all of the libraries in wide use, I think. Maybe with
one notable omission of libnss used by Mozilla, but I didn't find docs
for its hashing algorithms and don't know whether it supports multiple
calls to finalize. I doubt it does, though, given the existing practice.

The requirements imposed by HashAlgorithm basically prevents reusing
existing implementations that were highly optimized and audited and
instead forces to either rely on the implementations provided by
Boost.Hash2 or write new implementations from scratch. My point is not
to say that Boost.Hash2 implementations are bad per se, but that the
requirements of HashAlgorithm are prohibitively incompatible with
existing implementations, which makes the usefulness of the library
questionable.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk