Boost logo

Boost :

Subject: Re: [boost] [Container] interest in a flat_unordered_map
From: Treb Connell (trebconnell_at_[hidden])
Date: 2015-09-18 20:00:03


>
> Definitely interested. These days I was trying to think about an
> unordered flat container design for Boost.Container: if it should be
> open addressed (like google hash implementations) or not. I was trying
> to use a modified separate chaining implementation specially due to the
> low load factors that open addressing implementations need.
>
Other implementations would likely be a good idea even if we do pursue
this. I originally built this for game code that didn't use exceptions and
didn't modify the container while iterating, so it was a very good fit. Of
course being in a standard like Boost the iteration stability, exception
guarantee, implementation difficulty are pretty big problems with this
container.

I definitely think c++ users need a faster standard hash map available to
them, but I imagine we'll have to think more about exactly which
implementation is best. Like Jeremy points out it might be best to give
this a very specific name and then include other types of flat unordered
maps as well. If you want to consider this I can do additional
bench-marking to see if it can make up in speed what it lacks in other ways.

Why bucket related methods don't apply? How do you handle equal values
> in a flat_unordered_multimap container?
>
Sorry, I guess bucket methods can be made to apply. I don't typically
think of open addressing schemes as using buckets, but they do enough to
implement those methods.

> Umm. Typical open addressing-based implementations guarantee iterator
> and stability until rehashing.
>
Yes, that's a downside of this implementation. Iterator's are invalidated
on every insert and erase. The benefit to this is more speed.

It also loses strong exception safety as the new element might need to
> move an old one.
>
Another downside with this implementation. In the end there's no way to
make it strong when you're moving existing values.

More details:
This has an overhead of one pointer per slot. In my implementation I do
not save the hash and recompute it, but saving it would add the hash per
slot as well. The pointer is for a singly-circularly linked list between
collided keys. It is circular so that we can compute previous by
traversing the list until we find ourselves again.

The container itself stores begin, end, and a free pointer. The free
pointer is an optimization to help us find free slots. It points to the
'last free spot' then we increment up from there to try to find a new free
spot. This packs elements close together. There are other ways we could
do this, such as have a free list between them using the pointer that's
already there. Then we'd have to add a bool to each slot to know if
they're free because the container iterator has to jump over free nodes.

When we erase we may also have to move a value, if we erase a node at the
main position, and it has a chain, then we must move a node in that chain
into the main position.

I try to insert values that collide near each other by checking the two
adjacent slots before going to the free pointer.

I also was able to reduce iterator size to a single pointer by have a dummy
pointer at the end of the container that acts as a guard. It looks like a
non-free slot, so the iterator stops on it and keeps us from having to
store or check an end pointer.

I think those are all the implementation details that matter.

-Treb


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk