Boost logo

Boost :

From: Dietmar Kuehl (dietmar.kuehl_at_[hidden])
Date: 1999-12-09 10:22:59


Hi,
At 09:01 AM 12/9/99 -0500, Dave Abrahams wrote:
>Dietmar wrote:
>>Dave wrote:
>>>As far as I understand the term, a node descriptor is just what you get
from
>>>dereferencing a node iterator. If you can get to the property from the
>>>descriptor, you can of course get there from the iterator.
>>
>> ... because you can dereference the iterator?
>
>Or because you can get at the descriptor with a property accessor.
>Take your pick of interfaces.

As it turned out, I made my peace with 'operator*()': I was somewhat
confused about what it should return but it makes perfect sense to
have it return a corresponding descriptor. This way even another
problem is solved or at least reduced: What type of arguments is
used by the property accessor. Originally it though about using all
kinds of iterators for the argument identifying the object whose
properties are accessed. This can be reduced to only the respective
descriptor type.

>> Or because the property
>> accessors handles both iterators and descriptors? There is a subtle
>> difference between these two: The former assumes that there is indeed
>> a dereference operator ('operator*()') while the latter does not.
>
>I'm not sure I understand your latter alternative. Is it just the same as
>what I was saying?

I think what I had in mind was slightly different thank what you are
saying. Basically, I assumed that the access functions are overloaded
for different iterator and descriptor types but for the same property
accessor. What you are saying (at least what I think you are saying)
is that there are different property accessors, some of them are just
auxiliary to get at the descriptor object.

However, I was thinking towards a dead end. It is much simpler to
use 'operator*()' for iterator types to get at the respective descriptor
type. This makes perfect sense.

>> I'm still not convinced that a dereference operator for the iterators is
>> really the right thing: Currently I'm using an int eg. for node iterator
>> in my grid graph. This works just fine...
>
>Right. A dereference operator would force users to write a complex (?)
>wrapper. Of course, a templated wrapper which always returned itself as the
>proxy might be possible....

Yes, a template wrapper would be possible and it is a small price to pay
for the advantage of the 'operator*()'. Since the resource accessed with
the dereference operator in this case is essentially immutable, it isn't
even necessary to use a proxy.

[Discussion of iterators vs. descriptors deleted]

>You've lost me here. In what way would these differ from iterators?

The major difference between a descriptor and an iterator is that you
cannot iterate with a descriptor! A simple example could be this:
Consider iteration over a list. Once you have found a certain element,
you can just store a pointer to the elements data, ie. if you are using
a 'std::list<T>' you would store a 'T*'. If you never need to iterate but
only need to identify the object to get at the objects properties, this
is exactly what you want to do. Similarily when storing predecessors
in a spanning tree of a graph: You just need to store an identification
of the node, not the iterator. Since iterators *might* contain additional
information this can be more efficient. There are even cases where the
iterator uses temporary information which is gone once the iteration
was completed. In such cases you simply cannot store iterators, you
need to store something which is more persistent.

>I think a better name is needed for "descriptor". It is far too general.

We can use the German word "Verweis" :-) I looked at several
dictionaries but the translations to English are either "reference" or
something which is indeed related in the German use of "Verweis"
but does not match the use in this context (ie. "reprimand"). For a
better English word I'm probably the wrong person :-)

>node iterator = something which can be incremented through the collection of
>all nodes and whose value uniquely distinguishes a node within the context
>of a particular graph

Exactly. All iterators used in the context of graph algorithms should
be similar to "forward iterators" with the exception that you cannot
assign to '*it*. That is, these are "constant forward iterators".

>Important: you may need the graph object to get at the "contents" of the
>node (this includes the node's edge list) from the node iterator.

Yes. However, this is no difference to a node descriptor! Even with
a node descriptor you need the graph object to get at the "contents",
ie. the nodes properties, and associated structure, eg. the list of
incident edges.

>node descriptor = something which refers to the "contents" of the node, as
>opposed to its identity.

No: The node descriptor still only refers to the nodes identity! However,
you cannot use the descriptor for iteration. That is, the common
interface for node descriptors is actually empty: There are no required
functions beyond the normal stuff to pass these objects around (ie.
copy constructor, copy assignment, destructor) and to compare for
identity, ie. operators '==' and '!='. A default constructor, if required,
could be a convenient approach to get a "null descriptor", ie. a
descriptor not refering to any object (eg. the root node in a spanning
tree would store such a null descriptor to identify that there is no
parent).

>[Can you use a descriptor to distinguish identity?

I would think so. It is not necessarily necessary to distinguish identity
using descriptors but probably convenient...

>This one is really a vague concept to me... please elaborate]

Basically a descriptor is like pointer except that you cannot use
pointer arithmetic on it and you cannot dereference it. What remains
is basically an opaque object which somehow identifies some entity
in a graph, depending on the context a node or an edge (one
descriptor uniquely identifies either a node or an edge...).

This is the algorithms view of a descriptor! A property accessor can,
of course, dereference a descriptor if it knows that the descriptor is
actually a pointer. Or it can use the descriptor as an index into an
array if it know that the descriptor is an integer suitable as index.

The terms "pointer", "identifier", and "reference" would probably
all be suitable to describe this concept. However, they are all already
considerably overloaded with meaning attached to them.

>[do you expect to be able to compare node iterators for equality?

Yes, I think so. On the other hand, if equality is defined for the
descriptors, this might be sufficient.

>What about op<()?

No, probably not: The iterators are used to get at the objects in a
set. This set is normally not ordered in any particular way. The only
exception to this rule I'm aware of are planar graphs but these are
very special beasts anyway: They need several special treatment to
take advantage of their special structure eg. access to the dual of
an edge.

>Hashing?

Unless the answer to 'op<()' already answered this question, this
question is a little bit too brief to me to understand it...

>What about searches where the space of all nodes isn't easily captured --
>answer this at the bottom!]

First of all, the algorithms I have dealt with never need the list all nodes
for a different purpose than initialization of properties which is better
done differently anyway! However, this question might as well apply
to the list of all edges incident to a node. Here is an attempt to provide
input without actually understanding the question exactly: If you
cannot iterate over these sets of objects but they are need by an
algorithm, you are lost! Now, the iterators used are not just input
iterators but require a little bit more, ie. basically some support for
multiple passes and object identity (not for the iterator itself but for
the object currently identified). I don't think that you can successfully
apply graph algorithms to structure where these rather basic
requirements cannot be fulfilled. Potentially the requrement for
[amortized] constant time operations cannot be provided... Do you
have examples where you see a problem?

>>>OK, no problem... except that you don't visit nodes which have already been
>>>visited so you would only ever get a single incoming edge in the graph's
>>>"copy" (it would be a tree).
>
>I didn't notice that you had any way to address this below. The crucial
>difference is whether we are marking edges or nodes as visited, I think.

Well, nodes are marked as visited. However, an algorithm iterator might
have different break points. For a graph search an obvious break point
is when encountering an unmarked node. However, often the "non-tree"
edges are also interesting to the algorithm (eg. to compute biconnected
components). In this case there should be a break point for each edge
being processed.

What worth is the algorithm iterator then? It maintains the auxiliary
data structures and copes with marking nodes, storing edges in a
queue, etc. The user of the algorithm just acts on certain conditions.

>> It is indeed not at all easy to determine the right points where an
>> algorithms should stop and there are often multiple points. The
>> approach I'm taking in these cases is Duff's Device in the advance
>> function (eg. operator++()): A template argument is used to select
>> the break points using a bitmask. Each break point corresponds
>> to a state in an underlying state machine defined by the algorithm
>> (the state machines are there but normally not explicitly mentioned
>> in the description of the algorithms). If the algorithm breaks in a
>> certain state, this state is remembered and later used as an entry
>> point into the algorithm. In this context, it is convenient to use
>> Duff's Device to resume processing at the right point.
>
>Doesn't this slow down the whole process somewhat? Since you go through the
>same interface (op++()) to resume at any point, the algorithm would need to
>dispatch to the correct phase at run time each time op++ was called.

Honestly, I don't know! I have measured the effect of using this
approach compared with and written code (I used DFS or BFS for
a path search where the predecessors are stored in the nodes) and
got a penalty of a few percent (I think something like 5%). The effect
should definitely be measured. However, due to the rather structured
use I can imagine that compiler can optimize things a lot! BUT: This
is just a guess, not the result of measurements.

On the other hand, using this approach makes it viable to reuse
algorithms in the first place rather than implementing them from
scratch over and over again. The callback interface is not necessarily
better because it gives less control over the algorithm. What should
be used and what is more effective should definitely be measured.
Having a stable interface for the graph structure should give the
opportunity to do so!

Regards,
  dk


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk