|
Boost : |
Subject: [boost] [devector] optimum growing/insertion policy
From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2017-10-15 12:09:25
Hi,
Much an worthy discussion has taken place with respect to growing and
insertion
policies of boost::double_ended::devector. I think I can justify what
the optimum policy
is when the overall goal is to minimize number of element movements.
What follows
is *not* a formal optimality proof, though it's my hunch one could be
provided
relatively easily.
Assumptions
* The cost of allocating new memory is negligible when compared to the cost
of moving elements around (either intra-buffer or when migrating to a
new buffer);
so, the growing/insertion policy should focus on minimizing number of
elements
moved. This implies fully occupying free space before reallocation is
*not* the primary
goal of the growing/insertion policy.
* For the purposes of our discussion, we consider an ever-growing
devector without
intervening erasures --erasures can be accomodated later as we will see.
* Element insertion is modelled as a process where each new insertion
happens at a
random position within the current devector. The minimal
characterization (i.e. its
1st moment in stochastic process theory terminology) is just the
percentage R of
insertions that happen within the right half of the devector --we call these
*right insertions*; left insertions happen then with a frequency L=1-R.
push_back is a
right insertion, push_front a left insertion.
Facts
* When a right insertion arrives, if there's free back capacity the
optimum insertion
policy is to shift elements to the right, as this involves fewer
movements than
the other way around. The symmetrical applies for left insertions.
* If a right insertion at position N arrives and there's no free back
capacity (but there
is free front capacity), we have only two options:
 A. Reallocate to get back free capacity.
 B. Shift elements to the left.
The key observation here is that A is cheaper in the long run: as the
scenario is one
of an ever-growing sequence, reallocation will happen eventually and we can
just amortize it in our analysis, so reallocating early saves us the
extra movements
incurred when left shifting on a right insertion, namely N - (size()-N)
= 2*N - size()
movements. In fact, we save more than that as insertion is done
simultaneously with
reallocation, hence no shifting occurs additionally to the (amortized)
reallocation
cost. All this analysis applies symmetrically to left insertions, of course.
* The optimum growing policy is that which maximizes the probability
that *both*
front and back free capacities get exhausted at the same time. With our
1st moment
characterization of the insertion random process, this implies that when
reallocating
a percentage R of the new space should be devoted to the right side, and
a percentage
L to the left side.
Description of resulting policy
* Fix the growing factor to G (this can be 1.5 or 2 or anything greater
than one,
the actual figure is orthogonal to the discussion).
* Maintain a counter right_ of right insertions that gets incremented when a
right insertion occurs and *decremented* when an erasure is issued past the
middle of the devector.
* Right insertions trigger right shifting except when there's no back
free capacity,
in which case a reallocation is performed.
* Left insertions trigger left shifting except when there's no front
free capacity,
in which case a reallocation is performed.
* Reallocation reserves space for G*size() elements: of the resulting
G*size()-size()
free space, a fraction right_/size() is devoted to back free capacity,
the rest to
front free capacity. In practice, we may want to keep some minimum space
at each end as determined by some empirical threshold.
And that's it. Comments welcome.
JoaquÃn M López Muñoz
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk