Boost logo

Boost :

Subject: Re: [boost] boost] [flat_map] Any interest in a flat_map variant that is much faster on insertion/erase at the price of a slower lookup?
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2015-03-22 12:40:26


Giacomo Drago wrote:
> On 2015-03-22 14:41, Phil Endecott wrote:
>> There is an in-memory B-tree at google code that looks decent, though
>> I've not personally used it:
>>
>> http://code.google.com/p/cpp-btree/
>>
>> If you click "Wiki pages - UsageInstructions" at the bottom left of
>> that page you'll find tables giving the memory overhead. A B-tree has
>> no per-item pointers, but it does have per-page pointers; you can
>> asymptotically approach zero overhead by increasing the page size.
>> With an infinite page size you effectively have a flat_map, with
>> a page size of one you have a binary tree.
>>
>> In practice, a more important influence on memory footprint than the
>> per-page pointers is the empty space in the pages that results from
>> splitting after insertion, and after deletion but before merging.
>> But a flat_map implemented on a std::vector has a similar issue, i.e.
>> that std::vector doubles its size when it re-allocates. So a
>> std::vector will be wasting up to half of the memory that it has
>> allocated.
>
> On a vector you can easily shrink_to_fit thus bringing the overhead to
> zero (if you neglect the vector size and a pointer to its buffer).

And you can compact ("vacuum") a B-tree, if you want.

> I still think a flat_map (that does not mean this
> flat_map in particular) has some reason to be.

For me, the main uses of flat data structures have been:

(a) When it's important to have no pointers at all, so that I can store
the data in a memory-mapped file or perhaps in interprocess shared
memory. (Though most recently I decided to use offset pointers from
Boost.Interprocess along with a Boost.Intrusive map, and that worked
well.)

(b) When the data has a "life cycle" with different phases, e.g. an
initial phase of bulk creation, followed by a phase of random lookup.
In this case you only sort once so the computational complexity is
optimal.

I would not want to use them when either you have fine-grained mixing
of insertion and lookup, or when the amount of data is large enough
that the poor locality of reference starts to dominate.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk