Boost logo

Boost :

Subject: Re: [boost] [Container] interest in a flat_unordered_map
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2015-09-18 18:33:52


On 18/09/2015 22:09, Treb Connell wrote:
> I'm hoping to port a flat hash map that I've written to boost. Since boost
> has a flat_map I think it'd be good to have a flat_unordered_map as well,
> especially since there's an opportunity to be much more efficient. The one
> I would like to use is based on the hash map used in Lua, which is a mix of
> Coalesced and Robin Hood hashes. I don't think there's a name that
> describes it yet unfortunately, but I describe it near the end of this post.

Definitely interested. These days I was trying to think about an
unordered flat container design for Boost.Container: if it should be
open addressed (like google hash implementations) or not. I was trying
to use a modified separate chaining implementation specially due to the
low load factors that open addressing implementations need.

The easiest implementation would be to put all preallocated elements of
the array in a free node list. That would not help cache efficiency, but
at least it would be easy to implement, maintain and it would support
near 1.0f load factors. Typically this would require 2 additional
pointers plus the hash value stored per node (one for the bucket, one to
form the singly linked list of elements) plus alignment, but with
modifications some space could be saved. Surely cache efficiency could
be improved with additional tricks. It would support the standard
unordered interface including local buckets.

Joaquín's novel structure could be also "flattened":

http://bannalia.blogspot.com.es/2014/01/a-better-hash-table.html

> *Performance (primary reason for use):*
> It uses memory and cache very effectively and ends up being *very fast*.
> It's hard to benchmark real world situations, but in inserting, finding,
> erasing, and iterating over a dictionary it was up to six orders of
> magnitude faster than VS2013's unordered_map. This isn't a totally fair
> test because the cache efficiency really shines.

Good!

> *Standards compliance:*
> Differences from std::unordered_map are widely the same differences as
> flat_map listed here
> <http://www.boost.org/doc/libs/1_59_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx>.
> Also bucket related methods no longer apply, so will be removed. *That
> said there's one major standards compliance issue that I'm struggling with:*

Why bucket related methods don't apply? How do you handle equal values
in a flat_unordered_multimap container?

> The most optimal implementation of iterators for this container *does not
> have constant time traversal*. The implementation is a vector with
> elements scattered across it. To increment an iterator I continue to
> increment the internal vector iterator until I hit a new element. In
> practice elements should be scattered evenly, and will be close to one
> another, but in the worst case this is N-1. Additional memory or logic
> could be added to make this O(1), but I don't think that trade is worth
> it. In my applications of unordered_map I rarely iterate, and would not
> want to penalize insert, find, or erase to make iteration faster.

Not a big problem IMHO, the main use of a hash map is search and
insertion, not iteration. It's really a pity O(1) iteration and erasure
returning an iterator is required by the standard. It severely limits
implementation alternatives. Not sure if flat_unordered should respect
that (after all, flat_xx does not guarantee iterator validity or
insertion complexities)

> *Implementation details:*
> You can see Lua's implementation here
> <http://www.lua.org/source/5.3/ltable.c.html>. The applicable logic is in
> and below *luaH_newkey, *comments are useful.
>
> It's primarily a Coalesced hash table, but differs in collision behavior.
> When a new key collides with an old key, we check to see if the old key is
> in it's "*main position"*. The main position is the index that the key
> hashes to. If it's not in it's main position that means that it previously
> collided and was displaced. Since this is the new keys main position, we
> displace again the current key, and place ourselves there.

Umm. Typical open addressing-based implementations guarantee iterator
and stability until rehashing.

> This drastically decreases chain length, and improves the load factor that
> the hash map can optimally reach.

It also loses strong exception safety as the new element might need to
move an old one.

> Sorry about long post, can go into additional details upon request.

Yes, please.

Best,

Ion


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk