Boost logo

Boost :

Subject: Re: [boost] [heap][formal review]
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-06-13 10:18:22


hi thomas,

thanks for your formal review!

> -- One inconvenience for me was the use of max-heaps instead of min-heaps.
> Well, one could argue that a priority_queue should return the elements
> with the highest priority first (but as Wikipedia says: "some conventions
> consider lower priorities to be higher"), and heap-sort uses a max-heap.
> Since these are the two application where stl uses a heap, it's clear why
> stl implements a max-heap. But the applications where it is beneficial to
> call increase or decrease directly normally use a min-heap. Now I have to
> call increase even so the key value actually has decreased...

there are two conventions to consider: min-heaps (which are typically used in
textbooks) and max-heaps (which are used in the stl heap functions). i prefered
to be consistent with the stl.
likewise i am using the `top()' to extract the value with the highest priority,
not `find_min()' like it is often used in the literature ...

> -- I was
> left a bit with the feeling that I hadn't made the most efficient use of
> Boost.Heap.

the only way to find the best data structure is to test with a specific
workload. boost.heap does not provide a `one-size-fits-all' solution, but a
toolbox of different configurable data structures.

> For example, I quickly found out that calling update just with
> a handle (as I had implemented it first) was suboptimal for
> fibonacci_heap.

suboptimal in what way?

> - The performance numbers for Boost.Heap were neither bad nor outstanding.
> -- The numbers for b_heap are not convincing
> -- The numbers for fibonacci_heap seem to be the best.
> -- The pairing_heap has a smaller compare count than fibonacci_heap, but
> actually performs worse than fibonacci_heap in this test. Considering
> statements like "Experimental studies indicate that pairing heaps actually
> outperform Fibonacci heaps", I wonder whether the implementation of
> pairing_heap could be improved so that it actually outperforms
> fibonacci_heap.

for a performance comparison, one will have to consider
* expenses of comparison
* expenses of swapping data
* size of the managed data
* workload (mixture/order of API calls)

i suppose for every data structure one can come up with a synthetic benchmark,
so that this data structures outperforms every other implementation.

> > What is your evaluation of the design?
>
> I guess it's OK. What makes me a bit nervous is the statement "The library
> is not targeted in any specific domain nor directly related to any
> specific libraries". A good interface often is the result from an
> abstraction over different applications. The other way (coming from the
> data structure/algorithm) risks bloating the interface with unnecessary
> functionality. For example, somebody asked during the review for
> "iteration over the heap in priority order". Is this really something that
> many applications requiring a heap will benefit from? I heavily doubt
> this.

testing two heaps for equivalence will benefit from it. of course you can ask,
why do you want to test that ...

> > Do you think the library should be accepted as a Boost library?
>
> No, the library should not be accepted unless
> - The current review either magically goes back on track and at least 3
> other persons also submit a proper review, or another (mini?) review with
> proper participation is held later.

... i am a bit disappointed by the lack of reviews myself. given the fact that
this is the only one that actually contains a vote and the review stoped
yesterday i would appreciate other feedback

> - Either a convincing example where b_heap outperforms the
> other heaps is presented, or it's use should be discouraged in the
> documentation.

http://article.gmane.org/gmane.comp.lib.boost.devel/220245

> - The documentation says a bit more about the different
> heaps and their trade offs (amortized time <-> worst case), and generally
> tries to give recommendation to the user which heap will likely best fit
> his application, and which settings he should try (for example something
> like that arity<4> is a good compromise for a d_ary heap between cache
> locality and compare count).

a section of `best practices' may be helpful?

> I used the negative form of "Yes, the library should be accepted under the
> following conditions", because I'm unhappy with how the review went. The
> announcement was sent out too late, without stating the dates of the
> review, much of the initial discussion was about the code collaborator
> tool, but I haven't seen any "real" review, despite that fact that there
> seems to be interest in the library. There were some discussions on Code
> Collaborator, and the tool was actually not bad in pointing to concrete
> problems in the actual code. Tim quickly addressed these, but a good
> review would have some discussions and actual opinions and suggestions.

well, i would appreciate more reviews myself ... however if you are the only
person who includes a vote and this vote is no, boost.heap should be rejected.
no matter if the reason is `documentation', `code', `performance' or `review
process'.

> It
> is also unclear to me how well Boost.Heap performs with respect to
> boost/graph/detail/d_ary_heap.hpp. I could have tried to also include this
> heap in my heap test/benchmark, but as the review seemed to have silently
> died already, I wasn't motivated enough for doing this.

iirc bgl heap data structures are pretty specific (optimized for integers).

thanks for taking your time to review boost.heap.

tim




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk