Boost logo

Boost :

From: Martin Knoblauch Revuelta (mkrevuelta_at_[hidden])
Date: 2006-10-31 04:21:36


On 10/31/06, Rene Rivera <grafikrobot_at_[hidden]> wrote:
> Martin Knoblauch Revuelta wrote:
>
> > How about a sequence container (like a vector) with the advantage of
> > logarithmic time insertion/removal at any point of the sequence? The
> > disadvantage is that random access takes logarithmic time too (instead
> > of the usual constant time random access of a vector).
>
> Sorry if these where already mentioned, but I just got around to looking
> at the implementation:
>
> * There's no need for the m_next and m_prev fields. You can do inorder,
> preorder, postorder, and others without them.

Yes, I could, but operators ++ and -- would become O(log n). Many
operations benefit from the posibility of fast iteration through
m_next and m_prev. For example: begin() and end() are O(1).

> * I suspect you don't need the m_height either. But I don't know what
> you are using it for yet.

For balance. IMO, the logic of re-balancing becomes simpler by having
m_height instead of the usual {-1,0,+1}.

> * In my rank tree implementation the only extra field per node is the
> count, what you call width (as separate node and total). If I remember
> correctly in CLRS they describe the possibility of the count not
> directly reflecting the subtree+1 size to allow for sparse trees. I
> suspect that you can remove the need for the different m_node_width and
> m_total_width, using one only, by deducing the m_node_width from "count
> - left_count - right_count".

The width is based on the same concept as count, but they are not
related at all. Note that W might be float, for example.

I might deduce m_total_width from m_node_width by recursing through
the whole subtree every time. Obviously, this would spoil the whole
thing.

> * Overall you need to make use of STD utilities more. I think someone
> already mentioned something to this effect. Another example is using the
> iterator tags and other meta information. This is important to help out
> type specific algorithms.

I'll study this.

> * Also putting implementations out-of-line would help in reading what
> the various interfaces do.

I agree. I'll do so.

> I also dislike the use of abbreviations and
> single letter symbols, as it makes it harder to read. But this is mostly
> a matter of taste :-)

Yes, we might discuss this for ages. I try not to abuse them, using
longer names on variables that will live longer, or will be passed by
reference. Though, IMHO long names turn complex expessions unreadable
as operators get buried under lots of letters.

> * I don't see how not having a static allocator prevents moving nodes
> across equally typed containers.

It shouldn't, but it depends on how correct is the allocator class. I
read in the documentation of SGI that allocators must be fully static,
allowing deallocation of something by a different object of that one
used for allocation (perhaps even destructed in the meantime). Is this
true for _all_ std and boost allocators too?

> * I get the impression that there's is way more code than needed to
> implement this data structure.

I'm trying to refactorize some things, but it's a large interface... ;-)

> * You should use the allocator types instead of making types out of the
> value type directly.

I don't understand. Do you mean the rebind trick? I thought this was
the standard way.

> * Not sure why you would want the various unsigned/int/long vs size_type
> variations of methods. Having those extra methods just confuses the
> interface for users and introduces extra use cases to account for in the
> implementation.

It helps avoiding ambiguities when T is an unsigned/int/long. This is
how other container implementations solve it.

> * operator=(o) can't be O(max(N,M)). You have to both construct the new
> elements and destroy the existing elements which makes it O(M+N)

Right.

> and keeping exceptions in mind you do a swap in between :-)

Exceptions? Swap?

Currently, it looks like this:

  clear (); // delete old contents
  construct_nodes_list (..); // copy (T constr. might throw)
  build_known_size_tree (..); // still here? ok, build the tree

Do you mean I should do it this way instead?

  construct_nodes_list (..); // copy (T constr. might throw)
  clear (); // still here? ok, delete old contents
  build_known_size_tree (..); // and build the tree

> * You have a variety of C style casts, and you happen to be casting
> "this". That's frowned upon, and truthfully I can't see why you do it.

I'll change them to C++ casts.

With this trick, neither iterators nor tree nodes need to contain a
pointer to the avl_array object where they "belong", while the
interface allows working with iterators without requiring references
to the corresponding avl_array objects.

> * Algorithms rarely have a place in STL containers. You might want to
> move the various binary_search, sort, etc. methods to functions.
> Preferably having them use the same interface as the STD versions.

I'll study this too.

> * I would make the value sorted aspect a separate container, possibly as
> an adaptor to the rank container but possibly also some other form of
> tree. Sorted ordering is orthogonal to rank statistics.

Why? I wouldn't like to restrict the hybrid use of a container object.
I mean, using it as a sorted container for a while, then using it as
unsorted for a while, then sorting it again, maybe with a different
criterion, and so on... Note that iterators would remain valid through
the whole process...

> * You seem to be using a sentinel node as a hybrid root and tail node
> but making the avl_array itself that sentinel node. To say the least
> this is a bit strange to me.

In version 1.0, the sentinel was a member of the avl_array. I had to
use some ugly non-portable macros to get the avl_array from the
sentinel address. It's much better with inheritance.

> Why not have just a root pointer to a real
> root node?

An embedded sentinel (aggregated, or inherited) is IMHO far more
consistent with the idea of an "end" position. Additionally, it
simplifies the implementation.

> Now for naming...
>
> I don't think I'll ever be able to refer to this as anything other than
> a rank tree, even with the addition of a sparse version. Obviously you
> named it avl for the type of balancing you are doing. As for some of the
> other suggested names they fall into two categories: structure
> (log_list, llist, larray, chain, random_list, rlist, ralist,
> ra_sequence) and category (os_tree, order statistics dictionary, and the
> variety of synonyms for sequence).
>
> The category terms fail to account for implementations of other
> exemplars from that category. This is the reason why STL doesn't have a
> "container" class... I know it's an exaggerated example :-)
>
> The STL tends to stick closely to using implementation related terms
> which would fit more closely with using structure terms. Programmers
> find this intuitive because they learn data structures and algorithm,
> and hence they can use the names as an indication of what to expect from
> the implementation of the containers. Unfortunately this rules out all
> the proffered terms as they suggest implementations other than an
> augmented binary tree.

I would be pleased with anyone of the suggested names, including rank
tree if you don't mind.

Thanks a lot for your suggestions.

Best regards,

Martin


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk