Boost logo

Boost :

From: Samuel Neves (samuel.c.p.neves_at_[hidden])
Date: 2024-12-09 12:50:04


On Sun, Dec 8, 2024 at 7:27 PM Ivan Matek via Boost
<boost_at_[hidden]> wrote:
>
> Hi everyone,
>
> Implementation uses uint64_t in which it sums into hashes of particular
> elements(each computed with "fresh" hasher so ordering does not matter), then
> it finally hashes that sum and range size into original passed hasher.
>
> I see no problem with logic this since it cleverly exploits the fact that
> uint64_t overflow is defined and that sum is commutative.
>
> But I wonder if this preserves the cryptographic quality of original hash.

It doesn't, even if you preserve the length of the hash. There are two
attack approaches here, depending on the size of the sets you can work
with.

On the low-data side, you can turn collision or second-preimage
finding into a lattice problem. Suppose you have some set whose
intermediate hashes are [a=13816517845079277968,
b=2675547450723878905, c=18133324598626471763,
d=10200581252073748329]. Build the lattice spanned by the rows

[K*2^64, 0, 0, 0, 0]
[K*a, 1, 0, 0, 0]
[K*b, 0, 1, 0, 0]
[K*c, 0, 0, 1, 0]
[K*d, 0, 0, 0, 1]

where K is a constant large enough to force the first coordinate of
vectors to have more weight and sum to 0. Short vectors of this
lattice will have the form [0, x, y, z, w], which means that x*a + y*b
+ z*c + w*d = 0 (mod 2^64), or in other words the set [a,b,c,d] and
the set [[a]*(x+1), [b]*(y+1), [c]*(z+1], [d]*(w+1)] have the same
hash.

In the above case we can use K=2^32 and use LLL to obtain the vector
[0, 2870, 27951, 8550, 46911] and thus a second-preimage for the above
hash. Shorter second-preimages are possible by increasing the
dimension. In particular, for 64-bit hashes it is possible to find
lattice vectors containing coefficients only in {-1,0,1},
corresponding to removals and additions of single elements from the
original set, at dimension ~48.

Note that while finding shortest vectors in a lattice is a very hard
problem as the dimension grows, here we don't really have to find
exceedingly short vectors; they must instead have either -1 (removal)
or positive coefficients. For collision-finding, where we control both
inputs, we can start with many repetitions of the same element to make
the problem easier and allow reasonably small negative coefficients as
well. See also [2, Appendix B].

For larger sets, the generalized birthday attacks from Wagner [1,
Section 4] also should apply here with minimal modifications, and
reduces the security of an n-bit sum to O(2^{2sqrt(n)}).

As far as I can tell the only successful multiset hashes either
require enormous moduli or use elliptic curves.

[1] https://www.iacr.org/archive/crypto2002/24420288/24420288.pdf
[2] https://arxiv.org/pdf/1601.06502


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk