Boost logo

Boost :

Subject: Re: [boost] [review] Heaps: size and stability
From: Brent Spillner (spillner_at_[hidden])
Date: 2011-06-04 11:27:26

On 6/2/2011 at 19:27, Tim Blechmann wrote:
> The stability issue relates to a previous discussion from a couple
> months ago (brought up by Cliff Green,
> Cliff was
> concerned that the implementation of stability added overhead to the
> normally unstable heaps (i.e., an extra size_t, I think) from the
> implementation. I worry that the use of a policy parameter to specify
> stability will result in "accidental overhead" in cases where users
> don't fully understand the impact of their configuration choices.

>it is not possible to write every heap algorithm as `stable' heap. i do not
>recall the exact point that will break stability if one just orders the nodes in
>the tree, but there was a non-trivial issue (possibly related to mutable heaps).

It should be possible to preserve the correctness of most operations
(certainly find, insert, delete, merge, and rebalancing) for ordinary
heaps by storing items with the same key value in leftist heap order,
i.e. preserving the invariants that:
(1) if a subtree contains multiple items with the same key as the
subtree root, the item at the root is logically first in the order
(i.e., standard heap ordering);
(2) if both the left and right subtrees contain items with the same
key as the subtree root, all items with that key in the left subtree
are logically ordered before any item having the same key in the right
subtree (leftist property); and
(3) if a subtree contains multiple items with a particular key value
K, every item with key value K has as its parent another item with key
value K, with the possible exception of the subtree root
(locality/compactness property)

These properties are preserved by right- or left- rotation around the
subtree root. To keep the heap at minimum height, you will also
occasionally need a prune-and-splice operation (items with key value K
in the right subtree become children of the rightmost node with key
value K in the left subtree), followed by further rotations. The
compactness property should allow find, insert, and delete to continue
running in O(lg n) time when rebalancing is not required. The
prune-and-splice operations add O(|K|), where |K| is the total number
of nodes having key value K, to the worst-case rebalancing operation,
but this |K| quite reasonably ~O(lg n) when the key values are
sufficiently distributed. Some merge algorithms would clearly violate
the invariants above, but stable merge is still possible in O(n) time.

>iac the node versioning seems to be the usual approach for implementing stable
>heaps. the type of the version should be exposed to the interface, though, so
>one can specify arbitrary-sized integer types to avoid overflows ...

Versioning is the simplest way to adapt a non-stable heap
implementation for stability, but the space overhead (and concomitant
loss of flexibility in using a preexisting array to store the heap)
and the risk of overflow make it unsuitable for some applications,
including the quite common event queue. With a 32-bit counter value, a
queue that receives and average of 100 events/second will overflow in
about 16 months--- i.e. 32 bits is just enough space for your library
to pass a reasonable test battery and still fail in the real world.
(Of course, you need not increment the counter on every single
insertion, as I just noted in the Code Collaborator review.)

Another option would be chaining, where each heap node contains either
a single item or a linked list of items with the same key value. This
would minimize the required changes to existing algorithms, support
fancier heap types and heaplike structures, and only impose some
storage overhead (one or two pointers into the list, perhaps a length
counter, and a means of synchronizing list access (which could be an
atomic update protocol for the head/tail pointers and/or length
counter)) when you actually need it, at the cost of some type erasure
within the generic heap structure and possibly extra copying at
runtime (when a singleton node first becomes a linked list.)

On the subject of counters:

On 6/2/2011 at 10:19, Tim Blechmann wrote:
>will have a look ... in general boost.heap does not throw any exceptions by
>itself, but of course the held type, the container or the allocator can throw

Lines 231 & 232 of the review version of stable_heap.cpp are:
        if (counter_ == std::numeric_limits<std::size_t>::max())
counter overflow"));

(to be fair, this does seem to be the only case in the library... but
Thorsten is right that it should be clearly documented.)

Boost list run by bdawes at, gregod at, cpdaniel at, john at