|
Boost : |
From: Dietmar Kuehl (dietmar.kuehl_at_[hidden])
Date: 1999-12-07 08:54:04
Hi,
At 12:09 AM 12/7/99 -0500, jsiek_at_[hidden] wrote:
>I'm afraid this is email is a bit long, so here's a TOC :)
Hm... I should have done this with my e-mail then, too. This time I will
stick (again) to just giving comments, ie. the TOC basically stays this:
>I. Iterators & Graph Traversal
>II. Data Accessors & GGCL Decorators
>III. Algorithm Iterators & GGCL Visitors
>V. Implementation Experiments
>Preface
>-------
>One thing that should be carefully separated in this discussion of
>graph algorithm interface are the things that are merely syntactic
>differences, and the things which are fundamental differences. What
>I'm going to argue is that many of the differences between Dietmar's
>approach and GGCL is syntactic. However, this is not to belittle
>syntax, it is important for lots of technical reasons, especially
>interoperability with STL.
Some of the difference are, however, more than just syntactic difference
and have a rather huge effect on the actual work. The biggest issue in
this area is the difference between operator[]() and get()/set() (see
below).
>Also, I appreciate that both Dietmar and I have invested time in
>different interfaces so it may not be easy for either of us to be
>unbiased. Keeping this in mind, as I make arguments I'll try to be
>very specific so that each point can be easily evaluated.
The basic abstractions used by GGCL and me are identical: Iterators
and decorators/data accessors. Some of the differences are the syntax
which is not really a surprise. Other differences are details of these
abstractions which are IMO at least sometimes very important.
>I. Iterators & Graph Traversal
>------------------------------------
>
>First to address the differences in the interface for traversing
>through a graph with GGCL and with Dietmar's inteface... Dietmar,
>would you mind sharing what your current thinking is for your iterator
>interface?
That would be much easier if I were 100% clear about it: The problem
is that my current thinking in some sense goes a step back from where
I was coming. When I started doing this stuff I relatively soon kicked
out some kind of "graph object" which I have now reintroduced for
several reasons. But I'm not yet certain that it is really a good idea
although it solves at least some problems. Here are the details:
Node, edge, and incidence iterators are just forward iterators without
'operator*()'. That is, if 'it', 'it1', and 'it2' are one of these
iterators of type
'It', the following expressions are valid:
It it // default construction (value never used by algorithm)
It it2(it1) // copy construction
it2 = it1 // copy assignment
it1 == it2 // test whether the same object is identified
it1 != it2 // test whether different objects are identified
++it // advance to the next element
it++ // advance to the next element but return the current one
Basically this means that you can move any of these iterators and copy
them. All of these operations are also present in input iterators. Thus,
it is important to note that it is permittable to save copies of iterators
for later use: Unlike for input iterators, the value does not change
because a different iterator is advanced.
This is different from the interface I used in other published material
where I used a rather different interface at least for adjacency iterators
which became renamed to match their real use. The most notable
change is the removal of the 'cur_adj()' function. This change is mostly
due to the discussion with Alexander Stepanov where he basically
convinced me that the STL interface makes sense for incidence lists,
too. The biggest problem I had with the STL interface, namely where
to get the end iterator from without storing two iterators (eg. if an
iterator is stored in some form of priority queue) is now addressed by
the "graph object". Before discussing this I will briefly discuss the
ommission of 'operator*()' from this iterator interface.
Why not make these iterators STL forward iterators and add
'operator*()'? There are actually several issues:
- First, an iterator can, of course, have an 'operator*()': It is just never
used by any of the graph algorithms, at least not directly (indirectly
via a decorator/data accessor it may of course be used). That is, all
STL forward iterators are valid node, edge, and incidence iterators.
- Most interesting graph algorithms associate more than just one piece
of data with an object like a node or an edge, eg. in flow algorithms
there are upper and lower flow constraints as well as the current flow
associated with each edge. Which of these shall 'operator*()' access?
Basically, 'operator*()' addresses a similar problem that the decorators/
data accessors are intended to solve!
- 'operator*()' invariably returns a reference to some data, at least under
the current rules in the STL: The requirements on STL iterators
basically prohibit the use of a proxy class. Thus, 'operator*()' definitely
needs to return a reference to an object to be conforming to the STL
requirements. This implies serious restrictions (see below: operator[]()
vs. get()/set()).
Omitting the requirement for 'operator*()' has an interesting
consequence: Integers become valid iterators! I use this feature eg.
for grid graphs where the nodes can easily be numbers.
Given just these iterators is actually not very interesting since it is
necessary to get new iterator from a given iterator: You need to get
iterators for the edges incident to a node. Also, given an iterator for
an edge, either an edge iterator or an incidence iterator, it is also
necessary to access the two corresponding nodes, ie. the head and
the tail (since edges can always be considered to be directed, even
in undirected graphs, these are useful names, at least IMO).
At this point it is worth noting that often the identifier for the objects
is really used to identify only one object, that is it is not an iterator:
When obtaining an identifier for a node from an edge iterator it is
normally not necessary to get an iterator! Similarily, when storing
identifiers to objects eg. in a priority queue or with an object these
identifiers need not be iterators. Instead some form of reference or
pointer to the corresponding object is sufficient. Neither the term
reference nor pointer should be considered in the technical way as
used in C++: in German I'm using the term "Verweis" for these objects
but the only translations I can come up with are pointer (German: Zeiger)
and reference (German: Referenz; ie. "Verweis" is indeed something
different...). In my current code I just use the term "pointer" but these
pointers are neither used as iterators nor are they dereferenced (at
least not directly; a decorator/data accessor might dereference them).
The need for such a pointer became apparant in a situation where the
incidence list for a node is only accessible temporarily because a whole
subgraph was just a view. While the structure was only present
temporarily, objects within them could be uniquely identified and
pointers to them even needed to be stored. However, once the iteration
was finished, the pointers could only be used to identify objects but not
to continue the iteration. For incidence lists the same phenomenon
might occure: You can store a pointer to a node in the incidence list
but from this object you cannot necessary determine an iterator to the
list.
What I considered earlier only as an optimization technique (these
pointers tend to need less stored data than iterators although most
iterators tend to just store a pointer) seems to be a reasonable concept
of its own and I'm using these pointers in my current tests. The interface
for these pointers is very simple: You can default and copy construct
them, you can assign them, and you can destroy them ('P' a pointer type
and 'ptr', 'ptr1', 'ptr2' objects of this type):
P ptr // default construction; the value is never used
P ptr2(ptr1) // copy construction
ptr2 = ptr1 // copy assignment
The only extension to this interface I can image would be comparison
operators but I haven't needed them up to now. On the other hand,
using the default constructed pointer as null value and then comparing
these objects could be convenient. Currently I'm not using this feature
but I have to admit that I only have a few search algorithms and a viewer
(for X11) implemented for this interface.
Back to the structural aspects of graphs...:
To obtain other iterators from a given iterator, it necessary to have
some context information. For example, if there is indeed a node with
an incidence list stored (the typical textbook graph representation)
the context information is stored in the node object and the iterator
is likely to point to this node (or an edge pointing to this node). This
node object can then be used to create corresponding other iterators.
This does not work if there is no node stored at all. An obvious
alternative to storing context data would be in the corresponding
iterator or pointer. This turns out to lump lots of data into iterators
which can conveniently be "globally" available. As a result I introduced
a third option where corresponding data can be stored (in addition to
the nodes/edges and the iterators; storing the necessary data there is,
of course, still possible). This instance provides all sorts of global
data like various types and iterators. Of course, algorithms differ in
their requirements which features have to be supported by this object
but I have not grouped portions of this interface into useful abstractions
yet. Here is a relatively complete description of this interface.
- NP is a node pointer type and np is an object of this type
- EP is an edge pointer type and ep is an object of this type
- NIt, EIt, IIt are the types of node, edge, and incidence iterators
respectively
- nit, eit, iit are node, edge, and incidence iterator, respectively
Supported operations of a "graph object" 'go' of type 'GO':
- GO::node_pointer // node pointer type: NP
- GO::edge_pointer // edge pointer type: EP
- GO::node_iterator // node iterator type: NIt
- GO::edge_iterator // edge iterator type: EIt
- GO::incidence_iterator // incidence iterator type: IIt
- np = go.to_node_pointer(nit) // obtain a pnode ointer from an iterator
- np = go.tail(ep) // obtain a pointer to the start of the edge
- np = go.tail(eit) // obtain a pointer to the start of the edge
- np = go.tail(iit) // obtain a pointer to the start of the edge
- np = go.head(ep) // obtain a pointer to the destination of the edge
- np = go.head(eit) // obtain a pointer to the destination of the edge
- np = go.head(iit) // obtain a pointer to the destination of the edge
- ep = go.to_edge_pointer(eit) // obtain an ede pointer from an iterator
- ep = go.to_edge_pointer(iit) // obtain an ede pointer from an iterator
- nit = go.nodes_begin() // begin iterator for the "list" of all nodes
- nit = go.nodes_end() // past the end iterator for this "list"
- eit = go.edges_begin() // begin iterator for the "list" of all edges
- eit = go.edges_end() // corresponding past the end iterator
- iit = go.begin(np) // begin iterator for "incidence list" of the node
- iit = go.end(np) // past the end iterator for this list
- iit = go.begin(nit) // begin iterator for "incidence list" of the node
- iit = go.end(nit) // past the end iterator for this list
This is basically the complete thing. Extensions to this interface might
be operations mutating the underlying graph and potentially information
about the possible mutations.
Not all algorithms use the full thing and actually even if everything is
support, it can consist of significantly fewer functions because there
is no requirement that the pointer types are different from iterator types,
ie. 'node_pointer' and 'node_iterator' can be identical (correspondingly
for edges). If such an interface really works out (as I said, I'm currently
experimenting with this and it seems to work) it would make sense to
separate it into suitable categories of requirements.
Summary of the structural aspects of my approach:
- forward iterators for the structure
- "pointer" to identify individual objects
- a graph object to obtain associated structre.
> You gave a text description in your email, but not the
>details of the interface. I'm going to try to give a comparison here
>anyways, based on your previous publications (your thesis and some
>notes from the Dagstuhl conferences). If I misrepresent your
>interface I'm sorry!
Although the stuff from my thesis and the stuff in the notes from
Dagstuhl work fairly well, I think the interface I'm currently experimenting
with works better. Probably it is time to fix *one* interface and
implement algorithms according to this interface rather than
experimenting with always changing interface... I sure hope that we can
agree on a common interface which is then documented by the Boost
site. I will definitely submit at least simple algorithms using such an
interface (searches, spanning tree, connectivity, max flow). I still dream
of a free library of graph algorithms to allow companies to address non
trivial optimization problems (normally they shy away from corresponding
idea due to the "extraordinary complexity"; even if the complexity is
something trivial as maximum flows...).
>What I think the main difference is that in Dietmar's approach, one
>can obtain an adjacency iterator directly from another adjacency
>iterator with something like this:
>
>adj_iter2 = adj_iter1.cur_adj();
I gave up this idea although it was convenient: The restriction on the
iterator type are too strong.
>In GGCL the equivalent statement would be:
>
>adj_iter2 = adj(*adj_iter1).begin();
>
>where the dereference gives a vertex object, and then the adj()
>function gives the out-edge "list".
... but so are these! The information about the incidence list has to be
stored in the vertex (if it exists) or in the iterator. What I'm currently
using is this:
ajd_iter2 = g.begin(g.head(adj_iter1));
If this "extra" 'g.head(adj_iter1)' is a problem, a corresponding operation
can be moved into the "graph object". However, for edges it is not
necessarily clear which iterator is desired. Especially, if the iterator is
stored somewhere, the past the end iterator is probably the one for the
tail node...
[Discussion I basically agree with, reflected at least partially in the
changed interface I'm currently using, removed: I recognized some of
the problems already but I didn't express it clearly in my last e-mail]
>One perhaps non-obvious fact about the vertex and edge interface
>objects of GGCL is that they don't have to be persistent (on the
>heap), and in fact most efficient graph representations don't store
>"vertex" objects per se. In the GGCL implementation, the vertex and
>edge objects are typically very light-weight/stack based objects that
>are created "on-demand". They really only exist for purposes of the
>interface. My nickname for this kind of object is a "Mayfly".
At least I had assumed this because I'm used to dealing with virtual
object a lot: Generating "views" of graphs on the fly (eg. by ignoring
or adding edges by just extending iterators) is usual business with
graph algorithms. By adding a concept like the graph object this is
somewhat complicated because it becomes necessary not only to
create a wrapper for the iterator but also for the graph object to give
a consistent view. Still, the problem with the light weight objects
created on the fly persists: You need additional information, at least
occasionally. Storing data in the nodes/edges is not always an option
(eg. if these objects are not at all represented) and the only other place
are the iterators which turns the iterators into rather heavy weight
objects storing redundant data. The interface documented in my thesis
and in the Dagstuhl paper have the same problem. I only relatively
recently started to fiddle with a different interface.
>One possible argument against the vertex and edge objects would be
>that perhaps they create extra run-time overhead. This may be true of
>some compilers, but with a good modern optimizing compiler they
>introduce literally zero overhead. We have several sparse matrix
>ordering algorithms implemented using GGCL that match the original
>Fortran codes! (using the KAI C++ compiler).
The run-time overhead of such temporary objects seems to be
neglectable but I think it is problem to deal without a "global"
resource. However, this is a relatively new insight: This appeared
to me a while after discussing my interface with Alexander Stepanov
(this was in the week after the last Santa Cruz C++ committee meeting).
>II. Data Accessors & GGCL Decorators
>------------------------------------
>
>Dietmar and I agree that data accessors (GGCL decorators) are hugely
>important to graph algorithms.
There is actually broader agreement than this (eg. LEDA has a roughly
similar but less general concept: node and edge arrays; in LEDA these
are, however, specific classes without the flexibility to implement them
according to your needs). To avoid naming problems, I suggest that we
try to use a common name. Since I'm not really happy with "data
accessor" (although I'm used to this name by now) I can accept a
different name. However, "decorator" is a term already used for a design
pattern with a specific meaning and I don't think that the name really
matches the use: A decorators has to decorate something but often the
decorator is the data representation in the first place (at least if they are
used like data accessors :-)
"Data accessor" is, of course, my preferred name. If there is general
agreement that "decorator" is the correct term, I can live with that.
Other suggestions are welcome! Using a different name matching the
use can be a reasonable compromise...
> There really is only a syntactic
>difference here, and perhaps also some implementation flexibility
>issues. GGCL uses operator[]. Dietmar uses set() and get(). I agree
>that set() and get() are more flexible implementation wise, and I'm
>not entirely opposed to them. I would just be a bit sad to see the
>"for free" interoperability with random-access iterators (and builtin
>arrays) go away. (more about this in section IV.) Also, the bracket
>is nicer from a "pretty code" point of view, but this of course is a
>smaller concern. In addition, one can write read-only and write-only
>versions of a bracket operator, so I'm not entirely convinced of its
>inflexibility.
It is not just a syntactic difference or added flexibility! It is a coneptual
difference! There a basically four different categories of data:
- read-only, like constants or compute values
- write-only, like sequence numbers in a search
- read/write, like the current flow in a max flow algorithm
- lvalue data, like data actually stored in a node
It is obvious that you can easily model read-only, write-only, and lvalue
data using 'operator[]()'. However, you cannot model read/write data
using 'operator[]()' without using a proxy class! The operations for
reads and writes can be significantly different and you definitely need
to distinguish them. ... and it is not just an esoteric use where reads and
writes look different: For example, the free capacity during the max flow
algorithm can be conveniently maintained by a decorator/data accessor.
The tricky part about the free capacity is that it is computed differently
depending from which side you look at the edge: In one direction it
is just the flow (when looking against the direction of the edge), in the
other direction it is the upper bound minus the current flow (when
looking in the direction the edge is directed to). Instead of littering the
algorithm with this computation, it is conveniently done by a suitable
decorator/data accessor. Since the free capacity is not only read but also
stored eg. by the augmenting flow algorithm: changing the free capacity
also changes the flow and how this is done depends again on the
direction in which the algorithm looks at the edge.
Using 'operator[]()' this can be done by using a suitable proxy which
is returned from the function. However, using a proxy violates the
requirements for the corresponding STL operation anyway (basically
you have to return a reference). In addition, it is much easier to
distinguish the four decorator/data accessor categories depending
on the functions present in the interface:
- get(): read-only
- set(): write-only
- set() and get(): read/write
- set(), get(), at()/operator[](): lvalue
Note, that lvalue decorators/data accessors are only used to create
wrappers eg. to access members of the underlying object. The algorithms
only use the three other categories.
>III. Algorithm Iterators & GGCL Visitors
>----------------------------------------
>Next issue: algorithm iterators. I think they are a neat idea. Their
>motivation is the same motivation as for the Visitors in GGCL. Since
>algorithm iterators and Visitors are more about algorithm
>organization, I think it would be safe to postpose discussion on this
>issue until the core graph iterface is defined.
I agree that this is an issue which is not really related to the abstraction
for graphs. Thus, it can be discussed separately. However, I think it is
important to keep in mind that such object might be used because there
are some implication on the interface: If you are using functions, most
template arguments can be automatically deduced. You can do so at
least in some situations for algorithm objects, too (by creating
corresponding template functions) but it is easier if the template
arguments are kept to a minimum.
Maybe another thread should be opened on this issue :-) On the
other hand, the difference for the interface seem not to be that big
such that there is hope to reach an agreement relatively soon: At least
I don't have much problem with interface changes since I don't have a
large code base I have to change. Although this could be an argument
in favor of the GGCL interface I think there are some crucial issues which
have to be resolved in this interface, most notable the decorator/data
accessor stuff: 'operator[]()' is not acceptable!
>On a side note, I don't understand Dietmar's reaction against
>"call-backs". The STL uses the modern form of call-backs (functors)
>all over the place, and they seam to work very well.
My rumbling about callbacks was indeed a little bit unfounded: Actually,
you can view any virtual function and functor as some sort of callback.
... and in some situations, eg. exceptional conditions, callbacks are in
fact quite handy. However, the use of callbacks in algorithms is most
of the time rather inconvenient. Also, what I have stronly in mind with
respect to such things are things like various interfaces where
callbacks are used as a replacement for iterators.
>V. Implementation Experiments
>Dietmar's idea of doing some implementation experiments is great. I'd
>be happy to do up some examples in the GGCL interface and set up
>timing experiments, etc.
The way to go is IMO to define a bunch of [simple] algorithms and
implement them using a specific interface. These algorithms are then
to be applied to a variaty of [suitable] data structure: For the interface
to be truely generic, it is necessary that the algorithm can be applied
to an existing data structure. The result of these experiments should
show the efficiency of implementations using different interfaces and
the effort necessary to adapt a data structure to conform to a specific
interface (to do this, it should not be necessary to change the data
structure...).
In total, I think we have the following list of TODOs:
- Decide on a name for the decorator/data accessor concept. If
necessary we can draw vote.
- Decide on the exact interface for decorators/data accessors and
document it. This decision might be of broader interest than just
graph algorithms because the underlying concept is applicable to
other areas, too. In fact, I thought about implementing the STL
algorithms in terms of decorators/data accessors!
- Decide on a list of algorithms for the experiments and implementing
them with interfaces which seem to be suitable. Test cases to measure
should come with the corresponding data structure to which they are
applied. Real world examples would be preferable but since I can't
provide them, I would accept artificial test cases...
There is no TODO yet for a decision on the interface to the graph's
structure because I think this first requires some experiments to find
out whether there are indeed competing candidates... (I have no clue
whether my interface is really more efficient than other interfaces).
Maybe we should split the discussion into several threads: At least
the decorator/data accessor issues can be discussed independent
from the graph structure issues although there will probably be
arguments using graph algorithms in such a discussion.
Cheers,
dk
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk