Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2024-12-13 10:22:04


Samuel Neves wrote:
> One of the things that puzzles me about this library is its intended purpose.
> "Types don't know #" was clearly about hash types for use in hash tables,
> which explained why keying the hashes was mandatory. But it is unclear to me
> why MD5, SHA-1, and other cryptographic hashes are present here; they are
> way too inefficient to be of much use in hash tables, so there is little reason to
> include them. If, on the other hand, this is meant as a more general hashing
> library, I have strong objections as detailed below.

There exist intermediate use cases between "key in a hash table" and
"cryptographically secure hashing of a binary blob" that are also served by
the library. Some of them are:

- if you have a program with some program state S, represented by a C++ object,
  which the user can save to disk by means of e.g. Ctrl+S, you often want to know
  whether the current state has been saved or not (to display an indicator, or to
  decide whether to autosave on exit.) This is implemented by keeping a hash of
  the last save, and comparing the current hash to the last saved one.

- if you have something like Boost.Compute, which needs to compile shaders,
  you can keep a cache of already compiled shader binaries, along with the hashes
  of the source used to create them, and then skip the compilation if the source
  matches a hash of an already compiled binary.

https://github.com/boostorg/compute/blob/cf7907574d6159cd43e6cf687a7b656278c61dd0/include/boost/compute/detail/meta_kernel.hpp#L364

- if you need to send complex C++ objects over the network, you can first ask
  the remote endpoint "do you already have the object with this hash?" and if so,
  skip sending the object.

And there are many other cases isomorphic to the ones above.

In all of these cases, you want a very good quality hash function, one for which
you can ideally assume that collisions never happen, for some practical definition
of never (less than cosmic rays or random RAM bit flips, for instance). And
because of that, you want to be able to use a cryptographic, or ex-cryptographic,
hash function, even when there is no external attacker, because non-cryptographic
hash functions are generally optimized for speed and do not produce a sufficiently
long digest size. (But if a non-cryptographic hash works for you, you also want to
be able to use _that_, because more speed is always better than less speed, other
things being equal.)

In short, having a reliable framework for hashing arbitrary C++ objects with any
user-selected hash algorithm opens up a lot of possibilities that one might not
realize are there, and "how do I implement std::hash for my type" is only a small
fraction of them.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk