Boost logo

Boost :

Subject: Re: [boost] [flat_map] Any interest in a flat_map variant that is much faster on insertion/erase at the price of a slower lookup?
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2015-03-13 12:41:50


Peter Dimov wrote:
> Ion Gazta?aga wrote:
>> El 13/03/2015 a las 14:38, Phil Endecott escribi?:
>>
>> > (Personally, I have a flat_map that does batched insertions i.e. the new
>> > items are appended individually and then merged at the end of the batch.
>> > This will have significant benefits for some workloads.)
>>
>> Yes, this should be added to flat_map. The interface won't be very pretty,
>> but I would like to hear suggestions in this direction.

I think we discussed this before. My preferred approach is

flat_set S;
{
   flat_set::batch b(S);
   b.insert(xxx);
   b.insert(yyy);
} // b's dtor does the merge

Or, if you don't like doing things in destructors, give batch a
"commit" or
"merge" method.

One question is: should lookups while a batch is in progress (a) look
only at
pre-batch items, or (b) crash, or (c) work correctly (with a linear
scan of the
new items)? I prefer (a).

> It doesn't have to have an interface, at least in principle. flat_map can
> just append the new elements to the end, keeping them sorted, until it hits
> a threshold, then merge. Lookups will first look at the new element range
> (because those are likely to be in cache), if not found, at the old element
> range.

I once tried to work out the complexity of operations on this
structure, and
I don't think the results were very impressive.

Say you have N elements in the "main" container, and M elements in the
"recently added" portion. Insertion will be O(M) due to moving the older
"new" elements, plus some amortised amount for the eventual merge into the
main container. Lookup will be O(log N + log M). Traversal is O(N + M)
but with a significant constant factor as you have to compare main and new
elements as you go.

I think a useful maximum M is sqrt(N). If M is smaller than sqrt(N), then
you are doing the merge often enough that the amortised merge complexity
exceeds the insertion complexity.

It is also possible to make this a recursive data structure, i.e. the
"recently added" portion is the same sort of container with its own
"very recently added" portion, ad infinitum. I don't think I managed
to work out the complexity of that structure. I suspect that there is a
provable lower bound on what can be achieved with "flat" data structures,
though.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk