|
Boost : |
From: Peter Dimov (pdimov_at_[hidden])
Date: 2024-12-08 11:52:44
Andrey Semashev wrote:
> However, I still don't see the benefit of extending through subsequent calls to
> finalize() when the internal state is not larger than the hash value. Any input
> data that produce a particular hash value v will also produce the same
> extended value xv.
This is a hash function requirement. Equal inputs must produce the same
hash value.
> My understanding is that the main property of hash functions that determine
> their quality is how uniquely each hash value identifies a given input data.
I don't know what this means. A hash value can never uniquely identify
the input data (in general) because of the pigeonhole principle. There
are more possible inputs than there are possible hash values.
If you are saying that an extended 128 bit value is exactly equivalent to the
non-extended 64 bit value in the case the hash algorithm has 64 bits of state,
that's not true, because the message size is (usually) incorporated in the final
value returned from result(). So you could have, for the original 64 bit hash
function H, H(i1) == H(i2), but it's still possible for H'(i1) != H'(i2) if the lengths
of i1 and i2 differ.
And independent of all that, the avalanche property is still desirable, and all
good quality hash functions possess it. There's a test suite for hash functions,
SMHasher, which performs a battery of statistical tests, and a zero-extended
hash will fail those miserably.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk