Boost logo

Boost :

Subject: Re: [boost] [ boost ] [ Trie ]
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-03-05 19:30:43

On 5 Mar 2015 at 13:49, Kenneth Adam Miller wrote:

> Well, my requirements are that I store range-offset encodings and be able
> to access them for any reason as fast as possible. So, what I was thinking
> about doing was storing a range-value encoding as a tree, but it could be
> any kind of self balancing tree. The thing with red-black trees is, I know
> how to make them wait-free by reading and implementing a whitepaper that
> UTDallas published.

The rebalancing generates a lot of cache line invalidation traffic,
and ruins concurrent modify performance. Academic theory often misses
the reality.

> Does your nedtree cover the ability to store range value encodings? Because
> my thinking is, with a trie, you can get the first 3 bytes of an integer
> without giving up to much space (it requires only slightly less than 32
> megabytes contiguously stored). Each entry in that 32 megabytes can be a
> pointer to a red-black tree node, wherein when you do an insertion, you
> revert back to the wait-free redblack tree implementation. So, you can
> still do range-value encodings, but it will also be far, far faster than if
> you didn't having this caching to accelerate it.

nedtries provide a size_t key index algorithms. You can build the
rest from that.

Bitwise tries are hugely superior to red black trees for close and
exact find lookups, but inferior for closest find lookups. You can
see some graphs at

> I estimate that it requires at most about 11 instructions in the absolute
> worst case scenario to do an insertion, and typical 7 instruction on the
> usual "cache-miss"., and amortized 2 instruction for contiguous memory
> insertions. It is very highly scalable, and can be managed in flexibly with
> various concurrency runtime patterns.

I think you may need to scale your estimates by about 10x. A cache
line miss costs about 250 CPU cycles alone, and any kind of tree
algorithm spends most of its time waiting on cache lines.

This is why bitwise tries are much faster than red black trees on out
of order CPUs: they execute out of order much better. On an in order
core like Intel Atom bitwise tries are extremely slow, and they're
not great on even a Cortex A15 ARM either.

nedtries costs about 100 CPU cycles per insert/find/remove. That is
exceptionally well balanced, most algorithms are much slower at say
insert than finds (e.g. hash tables). If you do equally do finding,
inserting and removing, and your item count is less than 10,000,
bitwise tries are unbeatable.


ned Productions Limited Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at