Subject: Re: [boost] Request for feedback on design of concurrent_unordered_map, plus notes on use of memory transactions
From: Gavin Lambert (gavinl_at_[hidden])
Date: 2014-07-27 21:26:19
On 28/07/2014 03:20, Niall Douglas wrote:
> * item_type is allocated via a rebind of the supplied allocator, and
> looks like:
> struct item_type
> value_type p;
> size_t hash;
I presume this will support partial construction to handle the case when
value_type does not have a default constructor, or is movable-only, etc?
> * To insert or find an item involves determining the bucket from the
> modulus of the hash by the number of buckets, locking the bucket and
> scanning the linear list for an empty item or the right item as
> appropriate. We know empty items from a magic hash value which is
> defined to some magic constant (advice on the best magic hash value
> least likely to turn up from std::hash is welcome).
Isn't it by definition (admittedly reality is less ideal) that all hash
values have equal probability? What happens if a real item does end up
with the magic "empty" hash value? I think it'd be safer to have a
separate flag for this case.
> * To erase an item is easy: iterators store an iterator to the bucket
> and an offset into the linear table. Erasure involves constructing a
> temporary value_type, locking the bucket and performing a *move*
> construction out of the table into the temporary value_type,
> resetting the hash to empty. Destruction can then occur without
> holding the bucket lock. If the item just erased is the last in the
> bucket, the bucket's size (not capacity) is reduced by one, and that
> is repeated until the bucket size is the minimal possible.
If the hash is reset to empty and the lock released before destruction
occurs, doesn't that mean that something else could try to concurrently
construct into the same storage? After all, the item slot will now look
empty to insert(). Similarly the bucket size could be reduced by one
just as insert() puts something into that bucket slot (thinking that it
was still within the valid range, so not trying to increment it itself).
> * size() under this design becomes O(bucket count) as we sum the
> count members of each bucket. empty() is worst case O(bucket count)
> when the map is empty, best case O(1) when the first bucket is not
> empty, average case O(bucket count/item count/2).
You could use a relaxed count at the top level to improve this,
possibly. size() and empty() become statistical rather than
prescriptive in the presence of concurrent writes anyway, so a stale
value is not harmful. Although I suppose conversely updating a shared
counter will cause cache contention performance loss for insert/erase,
which may not be worth it for the benefit of statistical functions that
are less likely to even be called in a concurrent design.
> Thoughts on this design before I write a final implementation are
> welcome. I don't plan to complete my implementation past the minimum
> needed for proposed Boost.AFIO, so if someone else wishes to finish
> it that and make it part of official Boost that would be great.
I presume you've looked at other common implementations of this sort of
thing? eg. libcds.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk