Boost logo

Boost :

From: JOAQUIN LOPEZ MU?Z (joaquin_at_[hidden])
Date: 2006-05-30 17:58:13


----- Mensaje original -----
De: Maxim Yegorushkin <maxim.yegorushkin_at_[hidden]>
Fecha: Martes, Mayo 30, 2006 10:25 pm
Asunto: Re: [boost] [multi_index] benchmarking

> Darren Cook wrote:
> >> Here are results for cvs head version with your modification,
> which show that
> >> your analysis was correct:
> >
> > It would be interesting to see how your results compare to Google's
> > (open source) hash map:
> > http://goog-sparsehash.sourceforge.net/
> > http://goog-sparsehash.sourceforge.net/doc/performance.html
> >
> > I've not used them but they seem to give some good improvements
> (memory> or speed, but not both) over std_hash and std_map.
>
> Surprisingly google::dense_hash_set often outperforms all other
> containers in both speed and memory usage.
[...]
> 99997 elements; 10 iterations
[...]
> memory test (lower is better)
> gs_hash [ 21.64%] OOOOOOOO 432760 bytes
> gd_hash [ 52.43%] OOOOOOOOOOOOOOOOOOOO 1048576 bytes
> std_hash [ 61.58%] OOOOOOOOOOOOOOOOOOOOOOOO 1231568 bytes
> ext_hash [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586428 bytes
> mi_hash [ 79.32%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1586440 bytes
> mi_set [ 80.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 1599968 bytes
> std_set [100.00%] OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> 1999940 bytes

Hello Maxim, gd_hash is expected to be faster than other hash
containers, but in general it occupies more memory --only that
your test is particularly favorable to gd_hash, let me explain:

For a regular, bucket-based singly-linked hashed container, the
memory consumption for n elements can be calculated as
(assuming a 32-bit architecture where pointers are 4 bytes):

  R(n)=n*(sizeof(element)+4*(1+1/lf))

where lf is the load factor, typically oscillating between
0.5 and 1. Setting n=99997, element=unsigned int, lf=0.75,
we've got

  R = 99997*(4+4(1+1/0.75) = 1333293

which is between actual values for std_hash and
ext_hash/mi_hash.
Now, for a Google dense hash set the memory consumption
is expressed by the formula

  D(n)=2^(ceil(1+logbin n)) * sizeof(element)

where ^ denotes exponentiation, ceil(x) is the smallest
integer not less than x, and logbin is the binary logarithm.
With the same settings as above we've got

  D = 2^(ceil(1+logbin 99997)) * 4 =
    = 2^(ceil(1+16,06)) * 4 = 2^18 * 4 =
    = 1048576

that is, exactly what your figures show. Why is your test
favorable to gd_hash? Because sizeof(element) is small; if
you had used a not too large class having, say, four
int-sized elements, i.e. sizeof(element)=4*4=16, the
resulting numbers change a lot:

  R = 99997*(16+4*(1+1/0.75)) =
    = 2533257

  D = 2^(ceil(1+logbin 99997) * 16 = 2^18 * 16 =
    = 4194304

That is, R(n) grows as n*(C+sizeof(element)) with C a
constant, while D(n) has the form K*n*sizeof(element)
with K always >=2. Hence, when sizeof(element) is
sufficiently large gd_hash takes nearly twice the space
as a regular hash container.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk