|
Boost : |
From: Peter Dimov (pdimov_at_[hidden])
Date: 2024-12-09 16:43:22
Vinnie Falco asked me the following on Slack:
> I would ask, what is the motivating use-case for calling result
> twice? This is not explained in the docs and no examples are
> given. In fact, the one example given says "not to do this"
Calling result() twice (or more times) provides result extension;
the ability to extract variable number of bits from a hash
algorithm, instead of a fixed size value (e.g. 64 bit.)
This is in fact stated in the docs here
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_bytes_result
> Note that result is non-const, because it changes the internal
> state. Itâs allowed for result to be called more than once;
> subsequent calls perform the state finalization again and as a
> result produce a pseudorandom sequence of result_type values.
> This can be used to effectively extend the output of the hash
> function. For example, a 256 bit result can be obtained from a
> hash algorithm whose result_type is 64 bit, by calling result four
> times.
and there is an example of doing that here
https://pdimov.github.io/hash2/doc/html/hash2.html#example_result_extension
All hash algorithms are required to support result extension,
because (in my opinion) this is extremely useful functionality
that is easy - even trivial - to provide, but is often withheld
either by accident or in some cases, even deliberately.
Hash algorithms typically have a "finalization" phase that
pads the message, mixes the length, scrambles the internal
state in a more thorough manner than in `update`, and then
derives a hash value from that state. (The hash value is often
shorter than the total amount of state.)
If this "finalization" phase is performed more than once, one
naturally gets the mandated `result()` behavior.
Falco continues:
> I pointed out in the post I already made that the quality of
> digest from calling result twice is dependent on the hash
> algorithm, and there is no way the library can provide
> assurances on the quality
That's of course correct, but it also applies to the quality of
calling `result()` only once; it's naturally dependent on the
implementation of the hash algorithm.
What's important here is that it's not possible to provide
an extended result of better quality from the outside; the
hash algorithm is in the best place to provide it because it
has access to more bits of internal state than it lets out.
This requirement effectively mandates that all _hash
algorithms_ be _extendable-output hash functions_:
https://en.wikipedia.org/wiki/Extendable-output_function
Note that this is not the only innovation that the proposed
hash algorithm concept involves. All hash algorithms are
required to support seeding from uint64_t and from an
arbitrary sequence of bytes, which makes them effectively
_keyed hash functions_ (or _message authentication codes_).
Also note that the requirement that one can interleave calls
to `update` and `result` arbitrarily makes it possible to
implement byte sequence seeding (for algorithms that don't
already support it) in the following manner:
Hash::Hash( unsigned char const* p, size_t n ): Hash()
{
if( n != 0 )
{
update( p, n );
result();
}
}
Subsequent `update` calls now start from an initial internal
state that has incorporated the contents of [p, p+n), and that
has been "finalized" (scrambled thoroughly) such that the
result is not equivalent to just prepending the seed to the
message (as would have happened if the result() call has been
omitted.)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk