Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2004-01-04 19:21:04


"Wyss, Felix" <FelixW_at_[hidden]> writes:

>> > [Wyss, Felix] I would argue that resize() is semantically an
> invariant
>> > operation on a hash table; it doesn't modify the content.
>> > It's an operation to adjust the memory consumption and lookup
>> > performance tradeoff at runtime.
>>
>> Just like std::vector<T>::reserve? Should that guarantee that
>> iterators remain valid and not throw exceptions?
>
> [Wyss, Felix] The iterator validity guarantee of vector and map are very
> different. Map guarantees that iterators are only invalidated by
> deletion operations. Having resize() of hash_maps invalidate iterators
> violates, in my opinion, the "principle of least surprise".

hash_maps and maps are very different. Some people think it's a shame
that they share a common name element.

> I didn't say that resize() shouldn't throw exceptions, it should provide
> the strong exception guarantee.

Why? Can it do so at zero cost?

> It may well throw an exception when it tries to allocate a larger
> hash table, without violating the strong exception guarantee.
>
>
>> > [Wyss, Felix] Same here, resize() is a semantically invariant
> operation.
>>
>> I don't see why that should have any bearing.
>
> [Wyss, Felix] The same applies here; it's confusing if a semantically
> invariant operation may leave the collection in a zombie state. In
> general, I am very much in favor of the strong exception guarantee for
> collections, in particular where it can easily be provided without undue
> cost overhead.

Me too, where "undue" is defined to be "any amount at all".

>> > It's easy to implement a hash table that provides the strong
>> > exception guarantee for resize() by caching the hash code. This
>> > even increases performance for types with expensive compares. I
>> > thus dare asking: Why not?
>>
>> It incurs a space cost that some people may not want to pay and
>> decreases performance for some types.
>
> [Wyss, Felix] The only types I can think of, where the comparison is
> more expensive, are types of the same (or smaller) size than the
> hash. That usually means integers or raw pointers. For those
> types, it would incur one additional compare, but they're already
> cheap. For types that are expensive to compare, the savings can be
> substantial, in particular as the worst case (lots of collisions) is
> not as bad. This also means lower quality hash functions can be
> used without excessive worst-case penalties.

The use of extra storage has a bad effect on cache locality, and thus
performance.

> I have a hard time buying the memory argument, as doubling the
> maximum load factor will only increase the runtime cost by a single
> hash value compare but keep the memory consumption the same
> (assuming the hash value uses the same number of bytes as a
> pointer).

What does load factor have to do with anything?

> In the contrary, assuming the table is not regularly filled to the
> maximum load factor, the average memory consumption is actually
> lower.

I'm lost here.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk