|
Boost : |
From: Matt Hurd (matt.hurd_at_[hidden])
Date: 2005-01-17 07:54:07
On Mon, 17 Jan 2005 13:19:34 +0100, Hartmut Kaiser
<hartmutkaiser_at_[hidden]> wrote:
>
> Christopher D. Russell wrote:
>
> > > The requirement, that the input keys should be strictly
> > positive isn't
> > > a big limitation, because it is always possible to accordingly
> > > transform key sets containing possibly negative numbers, ...
> >
> > ? By addition of a constant positive offset?
>
> Sure.
>
> > > or to derive the keys from strings to match.
> >
> > ? You lost me here. Would you explain this point in a little
> > more detail please.
>
> You're certainly right, that there is no generic and simple way to map
> strings to integers besides reinterpreting the bit pattern of the string as
> a (probably long) integer. What I've meant is, that sometimes you may use
> some simple mapping, for instance adding all the character values of the
> string given that these give no collisions for your input key set.
The code I posted as part of this thread does this in a statistically
optimal way, it may be argued, for most types, including strings and
is quite efficient. It works for contiguous types as well, more or
less, including doubles and such things as UUIDs.
Where it differs is that the MPH generator generates a wonderful
collision free set for a fixed universe which leads to a nice mapping
from a code to an index suitable for contiguous storage or
enumeration.
I will interested to have a look at the MPH as see how it can adapt to
arbitrarily long keys, especially strings, fixed and variable length.
I can also imagine an "optimal" implementation touching the minimum
number of bytes and doing the minimum number of operations to derive
an optimal hash for a given set of strings... I'd be able to thwack
such a clever thing into many systems I have running and gain very
tangible benefits if such a thing were possible.
$0.02
matt.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk