Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2005-06-27 06:06:59

Vladimir Prus <ghost_at_[hidden]> writes:

> David Abrahams wrote:
>>> I'm not getting you. The hash value is calculated as part of the
>>> saving process itself, so it has the very same complexity.
>>> I've got the hunch you might be meaning something else,
>>> could you please elaborate?
>> The question is, when encountering a pointer or reference member that
>> needs to be hashed, do you do what the serialization process does and
>> hash the thing that it refers to, or do you just hash its address?
>> The former is deep hashing; the latter is shallow hashing. In a graph
>> with tracking, you serialize an object X once, and all subsequent
>> times you encounter X in the graph you just serialize an id. If you
>> add hashing, the first time you encounter X you'll hash it and
>> serialize it. For this one time it doesn't matter if hashing is deep
>> or shallow from a big-O perspective. However, when you encounter X
>> again, if hashing is deep, you'll do a lot of extra work. Actually, a
>> potentially infinite amount if there are cycles, so hashing really
>> does have to be shallow. I guess I answered my own question.
> If hashing is shallow, then what does it check?

That the immediate members of an object serialized at a particular
address have the same values.

> The library already has mechanism to check that the same address is
> serialized, and the point of the proposal, as I understand it, is to
> detect cases like:
> A* a = A()
> archive << a;
> // modify a
> archive << a;
> In this case, deep hashing is needed. Yea, it can be costly -- the more
> reasons to make it optional, maybe even at runtime.

Deep hashing is impossible. In the case of a reference cycle it will
never terminate. You could do deep hashing with marking of visited
objects so you don't hash them twice, but that's equivalent to shallow

Dave Abrahams
Boost Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at