Boost logo

Boost :

Subject: Re: [boost] [devector] optimum growing/insertion policy
From: Thorsten Ottosen (tottosen_at_[hidden])
Date: 2017-10-17 09:23:08


Den 16-10-2017 kl. 22:57 skrev Ion Gaztañaga via Boost:
> On 16/10/2017 19:18, Thorsten Ottosen via Boost wrote:

>>> 2) If a xxx (xxx = left or right) insertion is needed and
>>> xxx_free_capacity >= the number of items to be inserted, a xxx
>>> insertion is done (minimizes moves).
>>> 3) Otherwise if a xxx (xxx = left or right) insertion is needed and
>>> xxx_free_capacity < N, the sequence containing new plus old elements
>>> should be placed in the middle of the buffer. (Note: This step should
>>> be limited if amortized constant time is desired for push_xxx
>>> insertions)
>>
>> There might be some corner cases here. Suppose we have
>> front_free_capacity() == 3, back_free_capacity() == 0 and is about to
>> insert to the right half. Does it then make sense to place elements in
>> the middle?
>
> We are optimizing the general case, we don't know where the user is
> going to insert the next element. Putting data in the middle seems the
> choice that minimizes worst-case scenarios. What if the next insertion
> is in the left half (say, via flat_map)? In any case if we want some
> kind of cache, we could just move new elements to the side where free
> elements were available in the old buffer with some kind of logic, like
> moving data more to left if right free capacity was exhausted.

Right. My point was simply that it pays (in the average case) to move to
the middle even when there is only a free space of 3 on the other side.

>>> I think the RF factor obtains an amortized constant time for one side
>>> insertion if that property is considered useful.
>>
>> I think that is a must.
>
> Ok. That's what's used in the alternative implementation in:
>
> http://larshagencpp.github.io/blog/2016/05/22/devector
>
> using a reallocation factor of 0.8. push_back benchmarks seem quite
> inline with vector and deque. But it uses some statistics based approach
> to relocate the buffer so that the push_back case is optimized instead
> of a random insertion scenario.

Yeah, I'm not too happy about push_back/push_front doing movements
except in few cases. I think such ideas belong in advanced use of the
resize policy.

>> I do want to focus on simple properties from a user's perspective.
>> Some cases to consider:
>>
>> # case 1: push_back from empty state
>> # case 2: push_back from unknown state
>> # case 3: insert from empty state
>> # case 4: insert from unknown state
>>
>> How do these properties work for a devector? If I understand your
>> strategy, we get the following:
>
> I think all of them work.
>
>> # case 1: push_back from empty state
>> ------------------------------------
>>
>>    devector<int> v;
>>    v.reserve( N );
>>
>> This does not guarantee N pushes without exceptions or without
>> reallocation. To get that guarantee we have to write
>
> If I'm not mistaken, you get the same guarantee. The difference is that
> elements are "relocated" (moved), but there is no "reallocation" (new
> buffer allocation). They are first moved when N/2 elements are
> push_backed and left/right_free_capacity() is zero, then after
> push_backing additional N/2 elements, etc (the number of relocations
> (moves) steps depend on the "relocation factor").

Well, we don't get an O(1) push_back in that case, right? And if we do,
we have to reallocate.

How about about this: we use Joaquin's idea that if the requested new
buffer is smaller/equal to capacity, push_back shifts the whole sequence
to the leftmost position (and wise versa for push_front).

Then we just make sure that the allocated buffer size is always even, so
if the user asks for 5, we allocate 6. Then when the the right hand side
is full, we ask for size()*GF (GF should be two) and this is exactly the
the capacity.

This could work when we start with an empty container, but still leaves

   devector<int> v;
   ...
   v.reserve( v.size() + N );

as a potential problem, doesn't it (unless we allocate twice as much as
requested)?

I think there is no prefect solution, only it is clear that optimal
vector usage and optimal flat container usage requires slightly
different behavior of reserve() & clear(). Do we then

A. make reserve() & clear() behave like in vector while providing
versions that balance around the middle, e.g.

    reserve( int front, int back );
    clear( int front_capacity );

B. make reserve() & clear() behave unlike vector, with balance around
the middle. vector code can be made optimal by using

    reserve_back( int capacity );
    clear_back();

C. Don't define reserve() and clear() for devector.

kind regards

Thorsten


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk