Boost logo

Boost Users :

From: Ovanes Markarian (om_boost_at_[hidden])
Date: 2007-04-26 08:48:20


On Thu, April 26, 2007 12:29, Joaquín Mª López Muñoz wrote:

>
> OK, what's happenning is the following: boost::hash<std::string>()(str) is not equivalent
> to boost<const char*>()(str.c_str()); the latter hashes the *pointer*, not the contents
> pointed to. So, when you write
>
> your_container.find(str.c_str())

Well, looking through the source of boost::hash I came up with a question which of the hash_value
overload functions get called, especially when there is are functions:

template <class T>
std::size_t hash_value(T* const&);

template <class Ch, class A>
std::size_t hash_value(std::basic_string<Ch, std::BOOST_HASH_CHAR_TRAITS<Ch>, A> const&);

I considered string the string ctor sring(const Ch*, allocator_type& = allocator_type()) is not
marked explicit and compiler could prefer this match.

Debugging through the boost::hash I found the following:
Case 1:
char const* p = "Sint8";
size_t val0 = boost::hash_value(p);

hash_value(T* const&) is called and it hashes the pointer

size_t val1 = boost::hash_value("Sint8");
hash_value(T (&array)[N]) is called and it results in hash_range function call

std::string s("Sint8");
size_t val2 = boost::hash_value(s);
hash_value(std::string const&) is called and it also results in hash_range function call

Despite the similar hash_range call hash_values are different.
hash_range iterates through the range of elements by applying a hash_combine function to the
previous result (seed) and the current element.

Here the source of hash_range and hash_combine function:

    template <class T>
    inline void hash_combine(std::size_t& seed, T const& v)
    {
        boost::hash<T> hasher;
        seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
    }

    template <class It>
    inline std::size_t hash_range(It first, It last)
    {
        std::size_t seed = 0;

        for(; first != last; ++first)
        {
            hash_combine(seed, *first);
        }

        return seed;
    }

Despite the fact, that hash_range is overloaded since I have different iterator types
(string::iterator and array pointer) type of parameter v in hash_combine is in both cases const
char&. And the debugger shows that hasher from hash_combine is in both cases of type
boost::hash<char>. Therefore I don't understand how hash_combine can produce different results at
this point. But I think I got it!!! The terminating null character is also hashed.

Sorry for the long explanation and then the short idea, hope this is of iterest to others...

>
> B.MI assumes that what you pass (a const char *) is equivalent hash- and equivalence-wise
> to the stored keys (std::strings), which is not true (because of the hash, equivalence is
> indeed interoperable as std::string provides the required overloads for operator== etc.)
>
> Returning to your original problem, you said that this works:
>
> inline some_type_ptr create_type(std::string const& name)const
> {
> types_map::index<hash>::type::const_iterator i =types_.get<hash>().find(name.c_str());
> //...
> }
>
> but this does not:
>
> inline some_type_ptr create_type(std::string const& name)const
> {
> types_map::index<hash>::type::const_iterator i =types_.get<hash>().find(name);
> //...
> }
>
> This puzzles me a lot: Given that your types_map container is indexed on a std::string,
> things should be the other way around: it is the "find(name)" version that should work
> AFAICS. Could you please double-check?
>
I double checked it and both give the correct hash value. I think this is string dependent issue,
where the string uses COW idiom to save performance, Herb Sutter wrote about it in his Guru of the
Week (http://www.gotw.ca/gotw/043.htm). Therefore if I use const char* to initialize the string
probably it is used in the string as long as possible until the string is not modified. But if
this is so, there is more or less no reliable way to hash strings since these can be implicitly
converted from const char* to std::string and afterwards used as a hash key and return a wrong
hash result. Unfortunately I cannot step inside of the string constructors in MSVC 8.0 to see how
these are really implemented. This is not a trivial issue to hash strings I think. Thanks for your
time.

With Kind Regards,

Ovanes Markarian


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net