Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost.Functional/Hash
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2010-09-02 11:08:40


AMDG

Andrej van der Zee wrote:
> I have a question that I hope can be answered by using Boost. I have
> the following datastructure:
>
> struct ip_info
> {
> const u_int8_t _ifile_index;
> const unsigned long _src_ip;
> const unsigned long _dst_ip;
> const u_int16_t _id;
> const u_int16_t _length;
> const u_int16_t _src_port;
> const u_int16_t _dst_port;
> const u_int32_t _seq;
> const u_int32_t _ack_seq;
> };
>
> The object is identified by all its members. Now I want to count
> duplicates. But I have MANY of these objects, I mean billions. I have
> to reduce the memory footprint of my program, because it soon grows
> over 4GB and boom! So I rather not keep all the objects in memory and
> why would I, because I do not need them. I just need to count
> duplicates.
>
> My idea was to hash such objects to a smaller values and use that hash
> value as key_type in a map. The value_type of the map would be an
> integer. This something like std::map<int32_t, u_int16_t> or
> something.
>
> Then I though maybe use Boost.Functional/Hash, but are there any
> guarantees that the hash value is unique?

No. It's impossible to guarantee this, because there
are fewer possible hash values than input values.

> How unique are they? Are
> there better alternatives?
>

Assuming an ideal hash function that produces k-bit values,
the probability that n elements have no collisions is
\frac{2^k!}{(2^k - n)!2^{kn}}. With a 32 bit hash,
the probability of a collision is about 50% when there
are 77,000 elements.

In Christ,
Steven Watanabe


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net