Boost logo

Boost :

From: JOAQUIN LOPEZ MU?Z (joaquin_at_[hidden])
Date: 2005-02-20 12:57:25


Hi Howard, I'm glad you jumped into this discussion!

----- Mensaje original -----
De: Howard Hinnant <hinnant_at_[hidden]>
Fecha: Domingo, Febrero 20, 2005 5:31 pm
Asunto: Re: [boost] [multi_index][hash][rfc] Hashed indices
        preview +Boost.Hash

> On Feb 20, 2005, at 10:21 AM, JOAQUIN LOPEZ MU?Z wrote:
>
> >>> 1.4. Usual implementations use one pointer per bucket, whereas
> >>> Boost.MultiIndex has two pointers per bucket. This addresses
> >> potential> performance problems of insert() and erase() under low
> >> load conditions,
> >>> as discussed in
> >>> http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf
> >>> issue 6.14. Is this convenient?
> >>
> >> Well it's not unreasonable, you need what you need.
> >
> > BTW, have you read the committee's resolution to this
> > issue? It doesn't make any sense to me (Bill Wade's concern,
> > OTOH, seems very rasonable.)
>
> Imho the committee's resolution is quite logical.
>
> The hash containers exist for one reason: As an optimized
> solution, in
> both memory consumption and speed, compared to the tree-based
> containers. The speed department especially concerns lookup,
> insert
> and erase (in that order), but not iteration (there are other
> containers which optimize iteration, vector being the extreme
> example).
> The coder who carefully tunes his hash function and loading will
> be
> rewarded with significant optimization. The coder who doesn't,
> may be
> presented with a solution that performs worse than the tree-based
> containers, and is better off sticking with the O(Log N) solution.
>
> Imho, the committee should do absolutely nothing to penalize the
> coder
> who has carefully tuned his hash function and loading, either in
> memory
> consumption nor speed. Of course if hash container writers would
> like
> to add safeguards which may also add such overhead, I see little
> rationale for the committee to object to that either.
>
> Doubling the per element storage overhead for a perfect
> distribution
> with a loading of 1 is a serious hit in the memory consumption
> department, and may well lead to a performance hit due to more L2
> cache
> misses. Adding bidirectional iteration, again at the cost of
> doubling
> the per element overhead in a well tuned application makes little
> sense
> to me. The container is unordered. When you iterate in the
> forward
> direction, you access the elements in a generally unpredictable,
> unordered order. What motivation is there (motivation high enough
> to
> double per element overhead), to iterate in the reverse of the
> generally unpredictable, unordered order?
>
> Again, I have no wish for the committee to outlaw such hash
> containers.
> But I strongly feel that neither should they outlaw the more
> traditional hash container design which favors uncompromising
> performance when everything is tuned just right.
>

What strikes about the resolution is that it doesn't seem to
address Bill Wade's concern: in a minimum overhead "classic"
hash table implementation (one pointer per element + one pointer
per bucket), iteration is not going to be amortized constant under
low load conditions.
This is in very strong disagreement with that
the std requires about forward iterators. I think this
cannot be just swept under the rug: if the committee is going
to accept it, at least it should be noted somewhere.

The implementation I adopted (2 pointers per node + 2 pointers
per bucket) has O(1) iteration, insert() and erase(), whatever
the load conditions (provided the hash function behaves, of
course.) I agree with you bidirectional iteration is of little
value, and my main motivation was to achieve O(1). So, I'll be
happy to go for a lighter implementation if these issues
can be solved in another way --or if there's consensum that
costly iteration under low load conditions is acceptable.

PS. The worst of all implementations seems to be Dinkum's,
where iteration is O(1) but insertion will degrade to
O(bucket_count()^2) when a previous rehashing is issued,
as Wade points out. It also puzzled me that the comittee
NADed the issue without at least a comment on this.

Looking fwd to your comments,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk