Boost logo

Boost :

From: Hartmut Kaiser (hartmutkaiser_at_[hidden])
Date: 2005-01-17 06:28:52


 
Matthias Troyer wrote:

> > The initial overhead is tolerable, though, here are some
> timings on my
> > machine - Pentium M, 1.6GHz, 1GByte:
> >
> > 20 keys: 0.005 [s], 200 keys: 0.03 [s], 2000 keys: 0.15 [s],
> > 20000
> > keys: 4.45[s]
> >
> > where the time is raising slightly more worse than linear
> (actually it
> > has a linear and an exponential component, but the exponential
> > component kicks in for larger key sets only), depending on
> the number
> > of keys.
> > The memory requirement for the MPH generator is raising
> linearily with
> > the number of keys: roughly N/3 * sizeof(int), where N is
> the number
> > of keys.
>
> I'm very interested if this scales well for a larger number
> of keys, of the order of 10^6 - 10^8 keys. I guess that for
> those sizes the scaling will already be exponential. Can you
> tell us what the maximum number of keys is that can be
> efficiently treated?

Honestly, I've never tried it for such a large set of keys, but I expect the
times to be significantly larger. For a random key set of 200.000 keys I've
got 88.3 [s] on my machine...

I haven't expected somebody to need such large hash tables yet, but I'm
willing to think about how to optimise it further. OTOH, I think the time to
generate a MPH for larger key sets could be improved significantly, if
you're ready to accept a more complex hashing function (used at hashing
time) consisting out of several chained calculations. But currently I'm not
able to give any prediction wrt this. One of the main goals for this
implementation was to get a very simple and efficient hashing function.

I've tried to get an account on the new Boost vault, but still waiting for
the activation mail... As soon as I get this I'll upload the files, so
you'll have the possibility to play around with it.

Regards Hartmut


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk