Boost logo

Boost :

Subject: Re: [boost] [multi_index][ann] Boost.MultiIndex 1.56 preview: major update with new hashed index implementation
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2013-12-26 06:56:09


El 26/12/2013 0:46, Joaquin M Lopez Munoz escribió:

> The data structure I propose can do auto unlinking. Check
> https://github.com/boostorg/multi_index/blob/master/
> include/boost/multi_index/detail/hash_index_node.hpp , functions
> unlink @ line 280 (non-duplicate elements) and unlink @ line 438
> (duplicate elements): in both cases unlinking does not need any
> additional information from the container.

Very clever. As you store pointers you don't need to distinguish between
a node and a bucket and is_first_of_bucket/is_last_of_bucket can be
implemented just checking a chain of pointers. Cool.

> http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579
>
> was a NAD, so I had to comply, even though I take it as proven that
> complying incurs an extra pointer per element that the original
> proponents of unordered associative containers didn't see coming.

I agree. Quoting
"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html"

"my goal is neither to require nor to forbid stored hash codes. I don't
know of an implementation that currently stores hash codes, but I also
don't know of anything in this proposal that would forbid it."

and

"Singly linked lists have slower iterators, because an iterator first
steps within a bucket and then, upon reaching the end of a bucket, steps
to the next"

>> - If the user is willing to pay the price, it can offer "group" linking
>> as Boost.Unordered does for duplicated elements but although this does
>> not solve the "depleted bucket" syndrome. Your proxy-returning erase()
>
> Unfinished sentence?

Yes, sorry. I started thinking about your proxy-returning proposal and
how it could be implemented. I think that could be a good solution for
Boost.Intrusive using singly linked lists with no stored hash allowing the

while(it != itend)
   it = cont.erase(it)

pattern, which is what STL users are used to. But I guess conversions to
const_iterator and related would be a problem for the proxy.

> On the contrary, I think Boost.Intrusive can very easily implement
> it as it has the auto-unlinking property. Drop me a line if you want
> to port it to your lib.

Thanks! Currently boost::intrusive::hashtable code is very complicated
(it needs to deal with "store_hash", "cache_begin", "power_2_buckets",
"compare_hash" and "incremental" options) and adding another option to
implement your data structure without breaking everything else would not
be easy. I think it's better to have a separate implementation, even if
that could lead to some duplicated code.

I even wanted to add some kind of highly customizable (for a default
implementation Boost.Unordered is the way to go IMHO) experimental hash
table to Boost.Container based on Boost.Intrusive hash tables and
measure speed differences depending on different Boost.Intrusive options
(store hash or not, etc.).

> I think the variant for duplicate elements can be improved, as currently
> group linking kicks in only for #group>2. I decided to leave it like
> that for the moment but will get back to it when time permits.

Good.

> Dinkumware's excellent performance (despite its heavy overhead of
> 2 pointers per node plus 2 pointers per bucket entry) comes from
> to factors:
>
> * Bucket arrays are powers of two,
> * the data structure has very good locality,
>
> But is has a serious problem with erasure, as it's basically violating
> the standard by invoking the hash function, which can throw --and
> erasure can't throw. The only fix would be adding an additional
> node field to cache the hash value. As it stands now, Dinkumware's
> implementation of standard unordered associative containers is broken.

Dinkumware used to implement incremental hash in old versions (it was
the default in pre-unordered VC versions, and I think in VC10 it can be
activated through a macro, but that disappeared in recent versons) which
are also interesting to measure. No idea why they changed the
implementation.

Just another comment in case you'd like to improve your benchmarks: I
think "erase(iterator)" is much less used than "erase(key)", and in that
case I don't think Dinkumware data structure would be better (maybe
power of two bucket arrays can still help, but you need to traverse the
linked list to find the element)

Best,

Ion


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk