Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost 1.68.0 - boost hashing changed ?
From: Miguel Ojeda (miguel.ojeda.sandonis_at_[hidden])
Date: 2018-10-23 09:25:10


On Tue, Oct 23, 2018 at 10:19 AM degski <degski_at_[hidden]> wrote:
>
> On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users <boost-users_at_[hidden]> wrote:
>>
>> On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
>> <boost-users_at_[hidden]> wrote:
>> >
>> > I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.
>>
>> Hm... why would you expect the hash to be always the same between
>> releases, compilers, etc.?
>
>
> Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are
used, among other things, in network communications, persistent
storage, etc. They need to be "fixed" functions, and their standards
provide the exact definition. That is not the case at all with
std::hash or Boost.Hash.

>
>> I cannot find it with a quick look at
>> Boost.Hash's docs anything regarding a guarantee of that. If it is
>> like std::hash, then it is only guaranteed to remain equal for the
>> duration of the program.
>
>
> Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's].

Not sure what you mean. That is what I said.

>
>>
>> In other words, you cannot rely on saving it
>> nor comparing them to other hashes from other vendors, platforms,
>> architectures, compiler releases, etc.
>
>
> In my view this is an omission, the option to have exactly that should [have been] available.
>

Not really. You could argue, for instance, that precisely because
std::hash (and Boost.Hash) is meant to be used in maps/hash
tables/..., you should not be able to guess the values of the hash in
advance, in order to prevent collision attacks. In other words, the
implementation has even the freedom to provide a different hash
function every run of your program.

Not only that, but stating that the hash should remain constant across
C++/Boost releases is basically stating the hash function should be
fixed forever. That removes all the freedom for improvements when
future hash functions are discovered or implemented, with better
properties (which is what happened in the commits I linked).

In summary: the hashes provided by Boost or the standard are not
intended to be fixed functions; i.e. you shouldn't rely on the actual
values returned, only on the properties of the function. Namely, this
one: "For two different values t1 and t2, the probability that h(t1)
and h(t2) compare equal should be very small, approaching 1.0 /
numeric_­limits<size_­t>::max()."

Cheers,
Miguel


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net