Boost logo

Boost :

From: JOAQUIN LOPEZ MU?Z (joaquin_at_[hidden])
Date: 2005-03-10 15:56:06


----- Mensaje original -----
De: Peter Dimov <pdimov_at_[hidden]>
Fecha: Jueves, Marzo 10, 2005 6:01 pm
Asunto: Re: [boost] Re: Re: Re: [review] hash functions

> Thorsten Ottosen wrote:
> >
> > so basically you're saying that any collision is much worse than a
> > slower hash function because we must use == to go through the
> bucket ?
>
> Something like that, yes; of course it depends on the level of
> slowness.
> Cryptographic hash functions are typically overkill. Increasing
> the number
> of collisions slightly is tolerable (but can easily negate the
> effect of the
> faster hash function). Hitting a pathological case where a
> significant
> percentage of the key set maps to the same bucket is a serious
> problem.
>

Maybe the following approach could be taken to
allow for alternative implementations to be plugged into
the framework:

struct std_impl{};

template<typename T,typename Impl=std_impl>
struct hash
{
  std::size_t operator()(const T& x)const
  {
    return hash_value(x,Impl());
  }
};

template<typename T,typename Impl>
std::size_t hash_value(const T& x,Impl impl)
{
  return hash_value(x);
}

// standard hash_value overloads for integral types and pointers
...
//

template<typename T,typename Impl>
void hash_combine(size_t& seed,const T& x,Impl impl)
{
  seed^=hash_value(x,impl)+(seed<<6)+(seed>>2);
}

template<typename A,typename B,typename Impl>
std::size_t hash_value(const std::pair<A,B>& x,Impl impl)
{
  std::size_t seed=0;
  hash_combine(seed,x.first,impl);
  hash_combine(seed,x.second,impl);
  return seed;
}

template<typename It,typename Impl>
std::size_t hash_range(It first,It last,Impl impl)
{
  std::size_t=0;
  for(;first!=last;++first){
    hash_combine(seed,*first,impl);
  }
  return seed;
}

template<typename E,typename T,typename A,typename Impl>
std::size_t hash_value(const std::basic_string<E,T,A>& x,Impl impl)
{
  return hash_range(x.begin(),x.end(),impl);
}

// same as previous for the rest of STL containers

This framework is transparent to the regular user, who
never specifies an alternative implementation and can
define hashing for her own types as before:

std::size_t hash_value(const my_type& x)
{
  //...
}

unordered_set<my_type,/* boost::hash<my_type> */> us;

But the "advanced" user can tweak one or more parts of
the hashing framework ad libitum:

struct crypto_impl{};

std::size_t hash_value(int x,crypto_impl)
{
  // smart hash for ints
}

// we modify also range hashing

template<typename It>
std::size_t hash_range(It first,It last,crypto_impl)
{
  std::size_t seed=0;
  std::size_t n=1;
  for(;first!=last;++first){
    hash_combine(seed,*first,crypto_impl);
    seed*=n; // for instance
    n*=2;
  }
  return seed;
}

unordered_set<my_type,boost::hash<my_type,crypto_impl> > us;

What do you think?

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk