Subject: Re: [boost] [flat_set] When to sort?
From: Vladimir Batov (Vladimir.Batov_at_[hidden])
Date: 2017-03-26 11:00:31
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
> On 03/26/17 06:02, Vladimir Batov via Boost wrote:
>> I've been using boost::container::flat_set with quite a success until
>> recently when I profiled my app to see that 30-50% of the run-time was
>> spent in insert() for a container of around 50,000 in size.
>> So, I had to replace boost::container::flat_set with a vector
>> which inserts as a vector (no sorting). However, when it is the time
>> call begin(), end(), find(), it sorts once. That shaved the mentioned
>> overhead off. Say, one run is down from 17s to 10s and the likes.
> AFAIU, flat_set should not sort on insert. It should do lower_bound to
> find the insert position, which is O(log(N)). Sorting instead would
> give O(N*log(N)).
Yes, indeed. I expressed myself incorrectly. I did not mean sorting as
such. My apologies.
> What you're probably seeing is the overhead that comes from inserting
> in the middle of the vector that is used by flat_set internally. This
> especially matters if elements have inefficient move/copy.
Yes, I initially thought so too... but the value_type is merely a pair
of pointers. The comparison operation is expensive. So, it appears it
plays quite a role during initial construction. When I eliminated that
as described and sorted once... huge boost.
> containers are not well suited for frequent modifications, so that is
> kind of expected.
Understood. The described use-case constructs the flat_set only once
(but a big one and one insert at a time). So, on the surface all
appeared by the book -- constructed once, traversed/searched often.
Still, the observed construction overhead really stunned me.
>> I wonder if that should be the boost::container::flat_set default
>> behavior rather than sorting the underlying container with every
>> insert(). After all, as a user I do not care if the underlying
>> is sorted... until I start using/traversing it, right?
> No, this won't work in general because these methods are not supposed
> to modify the container. In particular, they don't invalidate
> iterators or references, and calling these methods concurrently is
I do not feel those begin() et al will invalidate anything because the
first call to, say, begin() will sort the underlying container once.
After that it all works as usual, right?
>> Or maybe I am
>> missing something in boost::container::flat_set which already allows
>> to do something of that sort?
> You can try using the methods and constructors that accept the
> ordered_unique_range tag. If you can prepare a sequence of ordered and
> unique elements, you can call insert(ordered_unique_range, s.begin(),
> s.end()) or similarly construct the flat_set to avoid ordering the
> sorted elements.
Yes, I am aware of ordered_unique_range tag. Unfortunately, (as you
described) it is getting messier and messier -- populate another sorted
container, then transfer to flat_set. My suggestion seems like an
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk