|
Boost : |
Subject: Re: [boost] [multi_index][ann] Boost.MultiIndex 1.56 preview: major update with new hashed index implementation
From: Joaquin M Lopez Munoz (joaquin_at_[hidden])
Date: 2013-12-25 18:46:55
Ion Gaztañaga <igaztanaga <at> gmail.com> writes:
>
> El 25/12/2013 18:12, JoaquÃn Mª López Muñoz escribió:
> > The most important change is a complete reimplementation of the
> > internals of hashed indices, so that erasure of elements is
> > constant-time regardless of the level of occupancy of the internal
> > bucket array [...]
>
> JoaquÃn, you blog posts are incredibly interesting. Congratulations.
Thank you!
> Just in case you want to check it in the future, Boost.Intrusive uses a
> policy-based data structure, but suffers from the same performance
> degradation as MultiIndex had in pre-1.56 versions. AFAIK, the good old
> bucket rooted singly linked list is the only one that can support
> "auto-unlink" hooks (nodes that delete themselves from the container
> without having a pointer to the container).
The data structure I propose can do auto unlinking. Check
https://github.com/boostorg/multi_index/blob/master/
include/boost/multi_index/detail/hash_index_node.hpp , functions
unlink @ line 280 (non-duplicate elements) and unlink @ line 438
(duplicate elements): in both cases unlinking does not need any
additional information from the container.
> - It does avoid the "iterator erase(...)" problem returning "void"
> instead of "iterator" (as Intrusive need not to be standard-conforming).
That would be my preferred soluton at any rate but unfortunately
the committee decided that LWG issue #579
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579
was a NAD, so I had to comply, even though I take it as proven that
complying incurs an extra pointer per element that the original
proponents of unordered associative containers didn't see coming.
> - If the user is willing to pay the price, it can offer "group" linking
> as Boost.Unordered does for duplicated elements but although this does
> not solve the "depleted bucket" syndrome. Your proxy-returning erase()
Unfinished sentence?
> - It can also store the hash value in the node and this is used for
> faster rehashing and lookups but it does not implement the
> libstdc++/libc++/Boost.Unordered data structure. This was planned for
> the future to improve erasure times.
>
> - Power of two buckets can be optionally enabled to speed up some
> operations if the user is sure the hash function is good enough.
>
> You novel data structure is really interesting, although I guess it's
> not easy to implement, specially when duplicates are around.
On the contrary, I think Boost.Intrusive can very easily implement
it as it has the auto-unlinking property. Drop me a line if you want
to port it to your lib.
I think the variant for duplicate elements can be improved, as currently
group linking kicks in only for #group>2. I decided to leave it like
that for the moment but will get back to it when time permits.
> It's interesting to see how well the Dinkumware implementation performs,
> as the memory cost is now lower than some years ago, now that the rest
> of implementations have added the hash value in the node (the overhead
> of the Dinkumware implementation is one pointer per bucket).
Dinkumware's excellent performance (despite its heavy overhead of
2 pointers per node plus 2 pointers per bucket entry) comes from
to factors:
* Bucket arrays are powers of two,
* the data structure has very good locality,
But is has a serious problem with erasure, as it's basically violating
the standard by invoking the hash function, which can throw --and
erasure can't throw. The only fix would be adding an additional
node field to cache the hash value. As it stands now, Dinkumware's
implementation of standard unordered associative containers is broken.
Best,
JoaquÃn M López Muñoz
Telefónica Digital
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk