Boost logo

Boost :

From: Alberto Barbati (abarbati_at_[hidden])
Date: 2005-03-12 18:01:31


David Abrahams wrote:
> Alberto Barbati <abarbati_at_[hidden]> writes:
>
>>Hi,
>>
>>I have a doubt about the proposed implementation of hash_value() for
>>pointers, which is:
>>
>> template <class T> inline std::size_t hash_value(T* v)
>> {
>> return static_cast<std::size_t>(v - (T*)0);
>> }
>>
>>this code calls for undefined behaviour according to §5.7/6, each time v
>>is a non-null pointer.
>
>
> Furthermore, you can't subtract addresses that aren't in the same
> array. 0 isn't in the same array as anything.
>

So we agree that this code has to be fixed in some way. I don't think
Boost could accept the code as it is.

>
>>Unfortunately, there's one more thing... even if p == q,
>>reinterpret_cast<uintmax_t>(p) is not guaranteed to be equal to
>>reinterpret_cast<uintmax_t>(q) on implementations that have multiple
>>internal representations for the same address. I don't think much can be
>>done (in a portable way) for such platforms.
>
>
> Once you're into reinterpret_cast you've left the land of strict
> portability anyway.
>

I agree with that literally, but I don't fully understand what do you
mean with such remark, so I will re-formulate my thought hoping to make
it clearer. Probably we are saying the same thing.

Apart from the very special case described above, using
reinterpret_cast<> as I describe will provide a viable hash function.
The fact that its values are implementation-defined is irrelevant, I
think (but... see below).

However, *in that special case above*, the basic requirement of hash
functions "equal arguments yield the same result" is not satisfied so
even reinterpret_cast<> does not provide a viable hash function. As the
standard does not provide any other portable mean to map generic
pointers to integers, there's *no way* we can obtain a hash for pointers
in this case.

Ah! And there's one more case in which we are helpless, although I'm
curious to see an implementation that exploits such latitude: the
wording used in the standard does not guarantee that the mapping from
pointers to integers always produce the same value over time. In other
words:

   char* p = new char;
   size_t i1 = reinterpret_cast<size_t>(p);
   size_t i2 = reinterpret_cast<size_t>(p);
   assert(i1 == i2); // ???
   char* q1 = reinterpret_cast<char*>(i1);
   char* q2 = reinterpret_cast<char*>(i2);
   assert(p == q1 && q1 == q2);

The standard guarantees that the second assert passes, but says nothing
about the first assert.

Coming back to the hash library, I think we have only three choices:

1) use reinterpret_cast<> as I proposed, but documenting that it will
produce a viable hash function only if the underlying implementation
satisfy certain requirements (each specific implementation /should/
document the behaviour of reinterpret_cast anyway, so the user may able
to check)

2) provide the hash function (with reintepret_cast<>) only for those
platforms that we are able to check in advance that will satisfy those
requirements and make any use of such function produce a static assert
for all other platforms

3) do not provide a hash function for pointers at all

I would vote for 1, but 2 and 3 also make sense. Is there a fourth choice?

Alberto


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk