Boost logo

Boost :

From: David B. Held (dheld_at_[hidden])
Date: 2005-02-26 15:32:50


I see I'm a little late to the party. However, I'd like to
summarize what I have noticed as the important concepts that
have been exposed, and chip in my own 2c. I think the core
concept that should be recognized is that a tree is not a
container of objects, but a container of containers. In this
way, it is somewhat like a deque. However, a deque has a single
natural sequence over all contained elements, whereas a tree
has many natural sequences. Now technically, a tree is not a
container of *just* containers. It is a container of objects
plus containers. But the only other core property of trees
is that the contained containers are themselves of the same
type as the tree, as Justin originally observed. Beyond
that, I don't see a need to distinguish between a tree and a
node, and it's my intuition that eliminating the distinction
should result in a more elegant solution.

Second, trees are complicated because there are two sets of
iterators that need to be addressed. First is the iterator
over objects in the root, and second is the iterator over
children. However, they are two separate kinds of iterators.
Dereferencing the object iterator should give you an iterator,
while dereferencing a child iterator should give you a tree.
Since whole-container algorithms expect an iterator that
traverses the entire structure, it should be clear that a
linear iterator must lay outside the tree itself and manage
the fact that two different types of implementation iterators
need to be managed. The reason this distinction is so easily
overlooked is that most trees have a trivial key collection
of size 1, so it isn't clear that an iterator is needed at all.
But I think a generic tree interface should recognize that
some trees have non-trivial key sets and provide both kinds of
iterators.

I haven't been able to look at Jeff's library because I can't
seem to access his web site, but it sounds to me like a good
design would consist of a tree type (which some might intuitively
think of as a node type) which exposes fundamental operations
for accessing keys and children, and a set of algorithms which
allow one to transform one tree into another by moving around
subtrees. Finally, flat iterators would have to be built on
top of the two iterator types presented by the tree itself.
It makes sense that those iterators would be parameterized on
the tree type itself, since it needs to know about both kinds
of embedded iterators. I'll offer this as fodder for thought:

template <typename T, int K>
class key_tuple
{
public:
     typedef unspecified key_iterator;
public:
     key_iterator begin();
     key_iterator end();
     T const& get(int pos); // convenience function
     void set(int pos, T const& key); // convenience function
private:
     T key_[K];
};

template <typename Tree, int N>
class child_tuple
{
public:
     typedef unspecified child_iterator;
public:
     child_iterator begin();
     child_iterator end();
     Tree* get(int pos); // convenience function
     void set(int pos, Tree const* child); // convenience function
private:
     Tree* child_[N];
};

template <typename T, int N = 2, int K = 1>
class tree
{
public:
     typedef key_tuple<T, K> key_type;
     typedef child_tuple<tree, N> child_type;
public:
     key_type key();
     child_type child();
private:
     key_tuple<T, K> key_;
     child_tuple<tree, N> child_;
};

template <typename T, int N, int K>
bool is_leaf(tree<T, N, K>* t)
{
     return std::find(t.child().begin(), t.child().end(), 0)
         == t.child().end();
}

template <typename T, int N, int K>
void find_first(tree<T, N, K>* t)
{
     tree<T, N, K>* left = t.child().get(0);
     return left ? find_first(left) : t;
}

etc., etc.

Clearly, the tree invariants would need to be carefully preserved
by the algorithms which operate on them. But since different
trees have different invariants, it makes more sense to group
the algorithms by the set of common invariants they enforce,
rather than by a concrete tree type. Some algorithms will have
very few invariant requirements, such as find_first() and find_last(),
and will work on any ordered tree. Others may have more stringent
requirements, but defining the tree as a pair of tuples and a set
of algorithms on those tuples seems like the best factorization
of behaviors possible.

At some level, we could package a set of invariants and associated
algorithms as a "soft" (less parameterized) type that exposes
a clean member interface for tree operations. But given the
diversity of tree types, I think this open structure, which
might seem to break all the rules of encapsulation, is the right
way to go at the lowest level.

Dave


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk