Boost logo

Boost :

Subject: Re: [boost] [devector] optimum growing/insertion policy
From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2017-10-15 15:59:46


El 15/10/2017 a las 15:50, thorsten.ottosen via Boost escribió:
> hi Joaquin,
>
> Thanks for this. Some comments below.
>
> Assumptions
>
> * The cost of allocating new memory is negligible when compared to the cost
> of moving elements around (either intra-buffer or when migrating to a
> new buffer);
> so, the growing/insertion policy should focus on minimizing number of
> elements
> moved. This implies fully occupying free space before reallocation is
> *not* the primary
> goal of the growing/insertion policy.
> —
> But in the real world allocations (+ the implied deallocation ) is not that
> cheap, especially not with non trivial destructors. For small containers,
> typical for flat containers, I’m not sure this is a good assumption.

The assumption is of an ever growing sequence. That said, I recorgnize
there are
scenarios not falling under this, most notably FIFO queues. I've given
the policy
a little more thought and think we can twist it to also accommodate
non-growing
scenarios, please keep on reading.

> Facts
>
> [...]
> * If a right insertion at position N arrives and there's no free back
> capacity (but there
> is free front capacity), we have only two options:
>   A. Reallocate to get back free capacity.
>   B. Shift elements to the left.
> The key observation here is that A is cheaper in the long run: as the
> scenario is one
> of an ever-growing sequence, reallocation will happen eventually and we can
> just amortize it in our analysis, so reallocating early saves us the
> extra movements
> incurred when left shifting on a right insertion, namely N - (size()-N)
> = 2*N - size()
>
> —
> How do you get that number ? I would say we save
> Incur 1/2 * size movements on average for shifting to the end with free
> space.

The number follows from: assume right insertion is at position N, that
means that
under normal conditions (there's back free capacity) we need to make
size()-N movements to the right. If instead of that we shift to the left
the resulting
number of movements is N, so the penalty for taking the "wrong" decision is
the difference N - (size()-N) = 2*N - size(). Did I explain myself now? 
The worst case
is when N is right at the end of the sequence (N=size()) and there's no
free space at
the back, the penalty is then N - (size()-N) = size(), that is, we have
to shift the
entire sequence to the left (in which case it's clear that reallocating
would have been
a better decision).

> Description of resulting policy
>
> [...]
>
> —
>
> Interesting analysis. Thanks. I agree if the sequence is ever growing, it’s
> better to allocate when one end is full.
>
> The idea of keeping track of left right insertions also interesting. It does
> assume additionally that the insert pattern in the future is the same as in
> the past, right?
>
> I think in the case where the user has called reserve, it’s a little
> problematic that insert may allocate anyway. That is, for many uses there is
> no infinitely growing sequence .
>
> The right_ member can become larger than size if elects are added to the
> right and removed from the left, not sure what to do about that.

Yes, you're right. right_ becoming greater thatn size() can't happen
under the ever-
growing assumtion, but it certainly can in the real world.

Let's slightly adjust the policy as follows. I think this copes with the
ever-growing
and fixed-size scenarios equally well. Let us remember that the goal
when reserving
new free space at both front and back is to make it so that both ends
get exhausted
at the same time --if they ever are.

* Fix the growing factor to G.
* The right balancing factor B is defined as the ratio between back free
capacity and total
free capacity. Let's say this is 50% at the start, that is, we have
front_free_capacity()=
back_free_capacity().
* Right insertions and left insertions shift to the right and left,
respectively (as before).
* When reallocation is needed, calculate the new balance factor as a
function of
front and back space consumption just prior to reallocation. Pseudocode
follows:

 Â  struct free_space
 Â  {
 Â Â Â  std::size_t back,front
 Â  };

 Â  free_space new_space(free_space original,free_space remaining)
 Â  {
 Â Â Â  float balance=(float)std::max(original.back-remaining.back,0)/
 Â Â Â Â Â  (original.back+original.front-remaining.back-remaining.front); //
denominator can't be 0
 Â Â Â  std::size_t
new_space=std::max(size()*G,size()+remaining.back+remaining.front),
 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  new_free_space=new_space-size();
 Â Â Â  return {balance*new_free_space,(1-balance)*new_free_space};
 Â  }

This has the effect that if no insertions happened at one end (more
precisely,
more erasures than insertions took place at that end) then the new
reserved space
for that end will be zero. Safe guards may be applied.
* When reallocation results in the same total space being reserved (this
happens
when the size at reallocation time is equal or less than the size() at
the last reallocation),
there is no need to request memory and we can simply shift the sequence
so that
back and front free capacity are adjusted according to new_space
calculations. This
nicely handles the FIFO scenario, for instance: when the growing side
meets its end,
reallocaton resolves to shifting the entire sequence within the current
buffer so that
space is kept at that end: this keeps cycling forever with no memory
allocation and
minimal element shifting.

Thougths?

Joaquín M López Muñoz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk