|
Boost : |
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2024-12-10 14:44:55
On 12/10/24 17:18, Peter Dimov via Boost wrote:
> Andrey Semashev wrote:
>> Above, I have intentionally omitted the constructor from uint64_t as I find it
>> confusing. In particular, it isn't clear what endianness it uses to seed the
>> algorithm. It looks like it duplicates hash_append.
>
> It doesn't use any endianness. It's just used as a seed from which to derive the
> initial state.
>
> Most, if not all, existing non-cryptographic algorithms take integral seeds.
Yes, but for other algorithms you must imply some byte order, as you
call `update(&seed, sizeof(seed))` in the constructor. Otherwise, the
algorithm seeded with the same integer would produce different results
on machines with different byte orders.
I'd say, if a specific hash algorithm supports seeding from an integer,
which is different from salting (i.e. simply calling update() on the
seed), that algorithm should provide the constructor from an integer. If
the constructor is described in HashAlgorithm, at least it should be
marked optional.
>> Also on the topic of get_integral_result for "std::array-like" hash values, a few
>> comments:
>>
>> - It requires the static size of 8 regardless of the size of the requested integer,
>> so even if uint32_t is requested it will still require at least std::array<unsigned
>> char, 8>. This should not be necessary.
>
> I'm not yet clear on the practical utility of using a result_type that is an array
> of fewer than 8 bytes. We don't have any such examples yet, so it's hard to
> tell whether this makes any sense and whether it should be explicitly blessed,
> and to what extent.
It's not so much about blessing anything, it's about not requiring what
you actually don't need.
>> - It ignores the data beyond the initial 8 bytes. It might be ok for the purpose
>> of slicing the hash value, but it is inconsistent with get_integral_result for
>> integers, which attempts to mix bits through multiplications with a constant.
>
> This is explicitly stated in the documentation of get_integral_result, in the
> Remarks.
>
> In practice it's a useful heuristic to assume that an array-like result implies
> a hash function of high quality (an avalanching one) so that picking any M
> bits would suffice.
If you're willing to rely on that the array-like result is of high
quality, why not do the same for integer results? I just think the
behavior should be consistent, whether the digest is an integer, array
or some user-defined type.
> Not making this assumption would make things a bit hairy; it's not quite
> clear how to turn an arbitrary number of not-high-quality bits into 32. The
> library tries its best for the integral types, but there the possibilities are at
> least a finite amount and multiplications are built-in.
>
>> It might be possible to reuse one of the hash algorithms implemented in
>> the library for this bit mixing.
>
> Interesting "meta" approach, but not necessarily better in practice than
> the current one, under any practical criteria.
Why? Yes, applying, e.g. fnv1a_32, would be slower than a single
multiply or slicing an array, but from the perspective of quality of the
reduced digest this approach sounds good to me.
>> The hmac_* type aliases are not provided for siphash and xxhash. But I would
>> actually prefer that those type aliases be removed for the rest of the hash
>> algorithms, so that the algorithms don't bring the dependency on hmac.hpp,
>> which is not always needed by the user. Typing hmac<sha2_256> instead of
>> hmac_sha2_256 is hardly any more tedious.
>
> hmac.hpp is a very light header, so...
This may be the case today, and typedefs are there to stay pretty much
forever. I say don't introduce them to begin with, they don't bring much
value anyway.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk