|
Boost : |
From: Hervé Brönnimann (hervebronnimann_at_[hidden])
Date: 2007-03-20 02:29:39
Ion: I have limited time to make a review, so I'm sorry that I had
to concentrate on the documentation without reviewing the code, which
I will do at a later date. However, what I saw from the
documentation is by-and-large how I would have done a design, no
surprises there but some nice twists and a very well thought-out
interface. The devil being in the detail I'm glad you validated the
design with some working code.
This being said, this is a first-rate and very useful library, and I
vote whole-heartedly to accept it to boost.
I have a few spot-comments on the documentation, to follow at the end
of this message.
My main criticism is with one aspect of the design: in std::set and
other associative container, the key (part of the value_type that
determines the order) is constant, and cannot be changed (once
constructed) since the std::set interface only grants constant access
to it. This makes it possible to maintain the invariant of the
std::set class. It is no longer possible to guarantee this with
intrusive containers. You need to mention that upfront in the doc of
iset with a big warning sign!
But now I must mention that this is actually a plus!!! In geometric
algorithms (Fortune's sweep for Voronoi diagram, to be specific), the
key is an intersection with the sweep line, and varies over time. To
simplify a long story, the algorithm has discrete events at which the
order can change, but the invariant is maintained by swapping two
elements whose keys "cross". In order to guarantee logarithmic time
per event, some kind of balanced tree is needed. Naturally one would
like to use std::set, but because the key changes and the user (no
longer the class) is responsible for maintaining the set invariant,
one must go through hoops, and use mutable data members, etc. It
gets ugly! The worse part is that one must add two nodes which are
equal (at the critical sweep event, they'll get ordered as soon as
the sweep line starts to move), but std::set does not allow to add
the two nodes with guaranteed relative position, without performing
some comparisons (which have no geometric meaning: the comparison
object must be tweaked). It gets *very* ugly!
For a long time, I have advocated separating the rbtree algorithms
from the actual data structure, for reuse. I wrote a paper with
Jyrki Katajainen (now a CPH STL report) about this. One can imagine
applying rbtree algorithms, or avl_algorithms, or
splay_tree_algorithms, all of them without a class; each offers a
different tradeoff. In particular, I'd expect treaps to perform very
well. In essence, this is exactly what your library does: it
separates storage from algorithms, and thus offers the potential for
reuse. This is the promise of generic programming even beyond the C+
+ STL.
You might add my Fortune's sweep example to the others you provide
(legacy objects, etc.) I imagine kinetic data structures and some
graph algorithms would be fine consumers of this approach as well.
I think it is very natural, and would very much like to (but won't
have the time), extend your framework with (geometric) halfedge and
(combinatorial) graph-like data structures. I think it will be a
great winner there, even more than standard C++ containers (one
dimensional). Ultimately, because geometric algorithms typically
require building several data structures on top of geometric objects,
the framework here has even more value than the implementations.
I would really like to see a chapter in the documentation "Guidelines
for extending the framework to domain-specific containers". of
course, it is not a requirement for the library, merely a suggestion
but given the potential for extension, you will have questions. A
graph-like example compatible with the Boost Graph Library would
superbly demonstrate the power of the approach! Who knows, it could
even become part of the intrusive library one day. After reading
your library, I feel that this is how my generic HDSTL (half-edge
data structure library) should be implemented (it remains as a half-
baked research project, having been published in WAE'2001).
I will take a longer look but the end of the review approaching, I
feel it's best to dump this now and promise later :)
Thanks for a very promising library, and regards,
-- Herve Bronnimann PS: Here are the more important comments I jotted down while reading the docs: I spent about two hours doing so, so forgive the sketchiness and incompleteness of the review, but one more review is always better than nothing :) * documentation: I wanted to see, while browsing the doc and clicking on the class name, the synopsis for header, in addition to the class interface for the class (following the C++ standard). But actually, by the end of the documentation, we get the header synopsis so the physical architecture of the library is clear. * documentation: iterator and const_iterator types seems to be missing consistently for each class interface (from islist, ilist, but not from iset/imultiset.) * documentation: bool ConstantTimeSize = (default not provided in ilist and islist). * design: const/mutable value type? While it is possible to violate the invariants of a std::set, that requires willfull wrongdoing from the programmer, whereas it seems much easier to inadvertently do so with intrusive set. Must be pointed out with a big warning sign. * documentation: iterator validity guarantees for islist are part of individual functions, an overview of the iterator validity rules in the description would be welcome. * indeed, the two foremost advantages of intrusive containers are: non-invalidation rules for iunordered_set, and possibility to store derived objects in the intrusive container (without extra indirection, that is - it's possible with storing pointers to object in std containers). * combination of smart pointers + intrusive containers: the intrusive container can't use simple pointers if the object lifetimes are managed by smart pointers. Thus it appears that having managed objects actually either forces the intrusive containers to use same managed pointers, or else to use the auto-unhook feature. This is not well discussed in the relevant page (about smart pointers and intrusive containers). BTW, I don't understand the requirement on smart pointer that "It must have the same ownership semantics as a raw pointer." I don't see why a shared_ptr couldn't be used, in fact, I would find some advantages to doing so (preferable imho to the auto-unhook feature). But I haven't thought too long about it. * rbtree_algorithms: here's a great opportunity to incorporate the order-statistic tree (augmented rbtree) with additional requirements on the node traits. If there is an int to spare in the data, then you can use it. You only have to make sure you use the correct algorithms (if rotating, etc., then the node's size field must be recomputed too for the order-statistic operations to be correct). The extra field can be computed at any given time (not necessary to maintain it through the insertions/manipulation) if only needed in one section of the code, and even reclaimed later if needed. This is similar to having a singly or doubly linked list sometimes or in other situations (see page "custom valueTraits"). Here is another use for that schema of using storage and reclaiming it when no longer needed: insertion in singly linked list (which is faster) and some operations on it (e.g. move-to-front heuristic which only requires islist), then making doubly-linked list (in linear time by setting the reverse pointer) to allow more operations when that is need (e.g. entering another section of the code). Or reusing the two same list pointers for an iset if the ilist structure is no longer needed.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk