Boost logo

Boost :

From: Dietmar Kuehl (dietmar_kuehl_at_[hidden])
Date: 1999-12-06 13:02:15


Hi,

{Yet another try... If the message was already approved, please reject
this copy. Thank you.}

At 03:49 AM 12/2/99 -0500, jsiek_at_[hidden] wrote:
>This is not a proposal for a library. It is a proposal for an
>interface (or set of "concepts" in the SGI STL meaning of the term).

Although I second the motion to define an interface to graphs for
the use in graph algorithms, I don't like the details of this proposal.
It is possible that this is just the NIH syndrom I'm suffering and I
will evaluate things carefully to find out whether they have the
capabilities as the things I have come up with. To allow someone
else to do the same comparison, Iwill [briefly] describe the interface
I'm using and then comment on the details of your proposal.

Unfortunately, I can't give you much current information on the stuff
I' m doing since there is nothing current written down which is ready
for release (although I hope to provide a starting point very soon: I
have been working on this stuff last weekend). There is some older
stuff which is mostly written up by Karsten Weihe from the University
of Constance, with whom I developed the basics during the work for
diploma. You can find several articles at written stuff from his home
page <http://www.fmi.uni-konstanz.de/~weihe/>.

The approach to generic graph algorithms I'm using involves basically
three aspects:

- iterators for nodes, edges, and edges incident to a node
- data accessors to access data associated with an object identified
  by an iterator
- algorithm iterators which step over the basic steps taken by the
  corresponding algorithm.

OK, a little bit more details are necessary to make sense of these :-)
The iterators are rather obvious in their use although not in their
interface: There are iterators for all nodes and all edges in a graph.
These are just iterators for sequences which can have an interface
similar to the interface of STL iterators although there is no need
for 'operator*()' (see below). I haven't found any need to define an
interface where these iterators come from because I actually haven't
found any graph algorithm which really needs to get these sets more
than once and in this case the iterators can conveniently be passed
in as arguments.

The other kind of iterators used is more interesting and actually the
basic type manipulated by all graph algorithms I have implemented:
Iterators for the edges incident to a node (falsely called "adjacency
iterator" in most of the work I have done with Karsten; however, we
where aware of this error at the time of the writing but we could not
drop our habit continuing the use of this name after we had used it
for over a year...). Basically the view taken by these iterators is
that
they represent an edge incident to a fixed node leading to another
node. Whether the edge is "incoming" or "outgoing" is irrelevant
to the iterator and it is up to the algorithm to give it the correct
interpretation if it is relevant to the algorithm. After long
hesitation
I recently decided that these iterator should also have an STL like
interface for iterators (more precisely forward iterators) but again
also without an 'operator*()': In the papers with Karsten you will
find a different interface which makes, however, some assumptions
resulting in unnecessary restrictions and unnecessary performance
penalities.

I think the most important idea are data accessors which are either a
generalization or idenditcal to your decorateors
 (I think they are a generalization but I'm not yet sure): All a data
accessor does is to, well, access data! More precisely, given a data
accessor object 'da' of type 'DA', an iterator 'it', and a value 'val'
of the
type of the data accessed by 'da', the following expresssions are
valid:

  DA::value_type val = get(da, it);
  set(da, it, val);
  DA::value_type& val = at(da, it);

Before any false assumption comes up: Not all data accessors provide
all three operations 'get()', 'set()', and 'at()'. In fact, there are
four
categories of data accessors, similar to the iterator catagories:

- read-only data accessors supporting the 'get()' method
- write-only data accessors supporting the 'set()' method
- read/write data accessors supporting both 'get()' and 'set()'
- lvalue data accessors supporting all three methods

lvalue data accessors are not used by algorithms but are used by
generic data accessors. For example, a "member data accessor"
might use a data accessor where the 'value_type' is a struct to
provide efficient access to one specific member of this struct. To so
efficiently, the underlying data accessor has to provide reference
access to the struct.

Here is a list of examples what data can be accessed by a data
accessor:
- Constant data which is not even represented: The passed iterator is
  just ignored.
- Data stored in a node or an edge (depending what object the data
  accessor accesses). This is similar to the "interior decorator
category".
- Data stored in some container which somehow indexed using the given
  iterator. This is similar to the "exterior decorator category".
- The data is computed by the data accessor using eg. different data
  accessors. For example, the length of an edge in a graph with
Euclidean
  distances can be computed from the position of the nodes.
- All kind of object properties like various degrees, nodes incident to
an
  edge, dual edges in planar graphs, the index of a node or an edge,
etc.
- The representing object in a union find algorithm can be computed by
  a data accessor.

What kind of data is represented by a data accessor is up to the
algorithm. The data may be problem data like position of nodes or
weights or capacities of edges. Or the data may be auxiliary data like
labels, flow excess, current flow, references into secondary containers
like priority queues, etc.

Data accessors are very simple but still a very powerful and useful
technique. Although they are simple and nearly obvious it took us
over a year to discover the simplicity of the solution: We wanted to
have the algorithms being independent from the actual representation
of the data and used various approaches. Once discovered, data
accessors are obviously the right approach to keep algorithms
independent from the actual data representation.

It is worth noting that data accessors distinguish between read and
write access: Actually, this is the basic difference between data
accessors and containers as found in the STL. Everything but the
difference in reading and writing can be achieved with containers, too,
and there would be no need for a new concept. Sure, a container
accessing the data in some object like the internal data of a node
would
be a rather strange beast but still reasonable. It took a rather
lengthy
discussion to find out this difference between containers and data
accessors.... However, it is often very important that different
methods
can be used for reading and writing. For example, the whole
representation of free capacity and current flow in network flow
algorithms is moved into a data accessor which does different things
depending on the direction an edge is currently interpreted with.
However, this whole stuff is moved out of the algorithm implementation
and localized to only the data accessor implementation!

OK, enough advertising data accessor. Next issue: Algorithm iterators.
For flexible algorithms this is just crucial. The idea is, again, very
simply
and obvious, once you know it: Graph algorithms are often extended
in various ways to achive additional goals. Just a few examples:

- During search, you want to record additional data like eg. the parent
  or a sequence number in which the nodes are found (additional
  operations).
- You want to terminate a search once you have found the correct node
  when searching a path between two nodes (premature termination).
- You want to augment additional flow after the capacity of an edge is
  increased in a maximum flow problem (resumption).
- You want to visualize the current state of an algorithm (inspection).
- You want to run two more algorithms parallel but in the same process
  and thread (preemption).

This list can be extended by yet a bunch of other things you want to
do to non-trivial algorithms while the algorithm is running. ... and it
is
fairly easy to satisfy them all: Turn the algorithms into a class,
provide
access to the logical state of an algorithm (eg. for a search algorithm
this is the contents of the queue), and have function which just
performs
a single step of the algorithm, eg. in a search advance to the next
node.

That's all! The algorithm iterator iterates of the different stages of
an
algorithm. Of course, it is not always simple to determine the exact
granularity of the steps (eg. for searches in general, advancing to the
next nodes is appropriate; for depth first search, there are much more
interesting states, eg. just before popping a node off the stack) and
sometimes this even results in multiple implementation of the same
algorithm just with different granularity of the steps. It might still
be
same class but different functions advancing the algorithm.

To round off this discussion, I have to admit that I'm still rather
undecided about certain aspects of the interface. For example, the
access to the list of incident edges (which can be, of course,
represented very different than a list if it is represented at all) can
be
done be data accessors, potentially a special data access then to
provide access to both the start and the end iterator. Alternatively,
it can be done with members of the node iterator which is, however,
no good idea for certain graph representation or with some sort of a
"graph descriptor" (probably the wrong name but I don't want to call
it a "graph class" because there does not need to be such a thing)
which has general graph access methods. This could be a good place
for a bunch of typedefs and access methods to the whole graph like
the list of all nodes, etc.

I'm continuesly fiddling around with this stuff. Getting a fixed
interface
would probably result in an implementation of quite a few algorithms in
rather short time (for example, I have implemented both the augmenting
path and the preflow/push algorithms for solving the maximum flow
probem dozens of times, using ever changing interaces...).
 
>In many ways, the interface is more important that the library itself.

Agreed. However, just the interfaces are useless. Deciding on a common
interface for graph might however stimulate different people to
implement
corresponding algorithms and adaptors for graph classes.

>The interface
>is designed to leave open the implementation of many different graph
>representations, including the typical adjacency-list and
>adjacency-matrix classes of implementation, but also including just an
>iterator-range of edges, or even a solely pointer-based graph
>representation. The interface is also designed to make graph
>algorithm highly reusable, similar to the way iterators and
>functors make STL reusable.

My favorite graph representations are those which don't actually
represent nodes or edges at all. For example, a normal test graph I'm
using is a grid graph: Everything necessary to represent this graph
are the dimensions!

>The first two concepts are pretty straightforward, Vertex and Edge.

Except that I can't see any use of vertices and edges in graph
algorithms!
Although the *underlying* concepts used by graph algorithms are
vertices and edges, these entities are never directly manipulated at
all.
What *is* manipulated are iterators to vertices (don't mind if call
these
"nodes" instead; I'm more used to calling them nodes) and edges and
data associated with these "objects" (they might not be objects in the
program, eg. because they are only implicitly represented: see the grid
graph example above).

> A
>traits class is defined for each concept for access to associated
>types, and the operations are expressed as global functions. I choose
>global functions because it is the most extensible access method, and
>leaves open the most possibilities for reuse, ie. the objects do not
>have to be of class type, and if they are imported from a third party
>(or if they are part of a legacy code) it is easy to add the global
>functions after the fact.

Since there is some need to implement some kind of adaption for third
party libraries anyway, I think it would be appropriate to add a traits
class which collects the appropriate functionality and typedefs. This
is the way I'm currently using. It is convenient, especially as certain
traits are template arguments to the algorithms anyway.

>V is a type that models Vertex
>u is an object of type V
>v is an object of type V
>
>Valid Expressions Description
>----------------------------------- -----------------------
>vertex_traits<V>::edgelist_type A type that models
ContainerRef*
>vertex_traits<V>::vertexlist_type A type that models ContainerRef
>out_edges(v) Returns on object of
edgelist_type
>in_edges(v)* Returns on object of
edgelist_type
>adj(v) Returns on object of
vertexlist_type

For these function it even makes sense to have some kind of graph
descriptor object! Otherwise you sometimes simply have to store
information in the iterators (I'm plainly assuming that the 'v' you are
talking of is basically an iterator) which is global to the graph! For
example, again in the grid graph example, a node iterator can be just
an integer, if you have some graph descriptor which has access to the
grid's dimension. With your approach, each iterator would have to
store either the dimensions or some reference to a graph descriptor.

Is this micro optimizing in advance? Since generic programming at least
attempts to provide implementation which are supposed to be as fast
as implementation specific to a data structure, I think it is necessary
to
think about such issues!

>The adj() function is somewhat redundant, since the adjacent edges can
>be obtained through the out_edges() method, however I figured it might
>be a nice convenience.

Just a note on this one which is not really interface related: Although
most descriptions of graph algorithms start with "it is assumed that
the graph does not contain loops (ie. edges leading from a node to the
same node) and parallel edges" (the exact wording might differ) this
assumption is *very* rarely used in the algorithm! It is normally only
used in the proof for the asymptotic performance. Thus, the adjacent
nodes, if they can indeed be determined efficiently, and the incident
edges might be rather different! There may be *much* more incident
edges than adjacent nodes. In particular, algorithm implementations
should be careful to allow parallel edges and loops, especially since
many real world graphs simply have them and it is an unnecessary
restriction. Of course, if it is a necessary restriction (for different
purposes than elegantness of the proofs), there is nothing which
can be done about it...

>concept VertexWithID

IMO a useless concept! Use a data accessor if you need an index.

>concept ContainerRef

For what purpose is this concept used in algorithms? I'm specifically
asking because I can't see any need for any container in certain
graphs.
Thus, using a ContainerRef in algorithms migh be an unnecessary
restriction.

>concept Edge
>============
>
>This is really not much more than a pair warmed-over to fit the graph
>abstraction terminology. For undirected edges we still use the
>source(e) and target(e) functions, but they lose their normal
>connotations.

Like for nodes, I see the use of edge iterators in graph algorithms but
no use of edges. Maybe a minor difference but conceptually IMO an
important one... Also, implementation of graph algorithms don't have
a concept of "undirected edge": When you access an edge, it *always*
leads from one node to another one, thus imposing a direction. Whether
it is important for the algorithm that this direction stays the same or
not
depends on the algorithm and its interpretation.

>concept GraphDecorator
>======================
>
>There is always a need to associate properties with vertices and edges
>such as color, weight, distance, etc. Generic graph algorithms need a
>consistent syntax for accessing these properties. I propose
>operator[]. This is consistent with the notation in several textbooks,
>and it is also compatible with the C++ builtin array types, or any
>RandomAccessIterator. Of course, there are lots of ways that property
>access can be implemented, but this concept just requires that the
>operator[] be the syntax used. One downside of the operator[] is that
>it must be a member function, which makes this less extensible than
>using a non-member function. However, I felt that the benefits of
>ease-of-use with RandomAccessIterators more important. Here are some
>examples of GraphDecorator's being used:

I'm *very strongly* opposed to using just one function for both read
and write access! You simply need different read and write methods
in a lot of situations! Don't put the burdon on the algorithm
implementer,
take it out of the algorithms wherever possible. My proposal is to use
data accessors instead! Although I think your proposed interface has
several restrictions, using the same function to read and write data is
the most severe one!

You can probably get me to accept everything else in your proposal,
although I don't like certain aspects, but I will never create
implementations with this! It is simply too restrictive.

>concept GraphVisitor
>====================
>
>The STL algorithms use functors to parameterize generic algorithms.
>Similarly, one would want to supply call-backs in order to
>parameterize generic graph algorithms. We identified several common
>"events" at which basic graph algorithms such as BFS and DFS should be
>parameterized, an made each of these a member function of the
>GraphVisitor.

Callbacks are a great invention for procedural languages. I stopped
using procedural languages ten years ago... Some techniques carried
over quite a while but I'm positive that I stopped using callbacks (in
interfaces I'm designing; some interfaces I have to use still use
callbacks)
at least five years ago. Defining callbacks under whatever name as a
concept today seems backward to me. I'm well aware that the visitor is
a pattern in the GOF book but that does not mean that it is a good
approach. It only means that people are still using it.

I think it much more convenient to use an iterator: The code using the
iterator is very local and not separated over multiple functions. Also,
it is rather obvious how associated data is to passed to the user of
the algorithm: The user just calls appropriate access function on the
algorithm iterator. No need to add artificial arguments which provide
the user with necessary context information (the state of the
algorithm,
user data, etc.) either.

> //initialization
> for (; first != last, ++first) {
> color[*first] = WHITE; // perhaps use color_traits instead
> d[*first] = INF;
> }

Of course, this initialization would also be taken out of the
algorithm:
Especially when exploring a graph multiple times where potentially
only a portion of the graph is inspected (eg. in the augment path
algorithm for the maximum flow algorithm) it can be much too expensive
to go through the graph each time. Instead, only a current value for a
counter can be stored: A node is marked if and only if it stores the
corresponding value. ... and, of course, if there are just two values,
these are probably best called 'true' and 'false', even though
textbooks
might use much nicer names...

>You are welcome to browse the web site describing the Generic Graph
>Component Library (I know, its a horrible name, but GTL was already
>taken).

The project using GTL uses it inappropriately: It has nothing to do at
all with STL, although they do claim similarities! It would be more
appropriate to GGCL to be called GTL but IMO it is still not there!

> Jeremy Siek
> Ph.D. Candidate email: jsiek_at_[hidden]
> Univ. of Notre Dame work phone: (650) 933-8724
> and cell phone: (415) 377-5814
> C++ Library & Compiler Group fax: (650) 932-0127
> SGI www: http://www.lsc.nd.edu/~jsiek/

Have you discussed you approach with Alex? What did he say about
it? It took me quite a discussion not to get ripped apart with my
proposal when I discussed it with him and actually this discussion
resulted in several changes in my approach.

Thinking of this discussion: I wanted to implement a bunch of
relatively
simple algorithms to compare their runtime with implementations using
a different approach. I got a little distracted by tasked to finish
first
(eg. the priority queues) but I still haven't finished it. Anyway, I
propose
that we define a set of simple algorithms used to compare different
approaches: This would show several aspects like

- the runtime aspects of the approach
- ease of use
- ease of implementation (which is, IMO, the least important)
- generality

Using a set of graph classes, eg. the LEDA graph (well, this is no
really good idea because working at commercial company requires that
I buy a license...), some graph data structure(s) from GGCL or GTL, my
grid graph, we could see how things work.

Regards,
  dk
__________________________________________________
Do You Yahoo!?
Thousands of Stores. Millions of Products. All in one place.
Yahoo! Shopping: http://shopping.yahoo.com


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk