Boost logo

Boost :

From: Wyss, Felix (FelixW_at_[hidden])
Date: 2004-01-04 17:47:53


> > [Wyss, Felix] I would argue that resize() is semantically an
invariant
> > operation on a hash table; it doesn't modify the content.
> > It's an operation to adjust the memory consumption and lookup
> > performance tradeoff at runtime.
>
> Just like std::vector<T>::reserve? Should that guarantee that
> iterators remain valid and not throw exceptions?

[Wyss, Felix] The iterator validity guarantee of vector and map are very
different. Map guarantees that iterators are only invalidated by
deletion operations. Having resize() of hash_maps invalidate iterators
violates, in my opinion, the "principle of least surprise".
I didn't say that resize() shouldn't throw exceptions, it should provide
the strong exception guarantee. It may well throw an exception when it
tries to allocate a larger hash table, without violating the strong
exception guarantee.

> > [Wyss, Felix] Same here, resize() is a semantically invariant
operation.
>
> I don't see why that should have any bearing.

[Wyss, Felix] The same applies here; it's confusing if a semantically
invariant operation may leave the collection in a zombie state. In
general, I am very much in favor of the strong exception guarantee for
collections, in particular where it can easily be provided without undue
cost overhead.

>
> > It's easy to implement a hash table that provides the strong
exception
> > guarantee for resize() by caching the hash code.
> > This even increases performance for types with expensive compares.
> > I thus dare asking: Why not?
>
> It incurs a space cost that some people may not want to pay and
> decreases performance for some types.

[Wyss, Felix] The only types I can think of, where the comparison is
more expensive, are types of the same (or smaller) size than the hash.
That usually means integers or raw pointers. For those types, it would
incur one additional compare, but they're already cheap. For types that
are expensive to compare, the savings can be substantial, in particular
as the worst case (lots of collisions) is not as bad. This also means
lower quality hash functions can be used without excessive worst-case
penalties.

I have a hard time buying the memory argument, as doubling the maximum
load factor will only increase the runtime cost by a single hash value
compare but keep the memory consumption the same (assuming the hash
value uses the same number of bytes as a pointer). In the contrary,
assuming the table is not regularly filled to the maximum load factor,
the average memory consumption is actually lower.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk