Boost logo

Boost Users :

From: rzako23 (rzako_at_[hidden])
Date: 2002-05-06 01:03:01


Dear Craig,

--- In Boost-Users_at_y... #757, Craig Hicks <hicks_at_k...> wrote:

> A graph can represent associations between two nodes by means of the
> edge (or edges) which connect those two nodes. The nodes are,
> needless to say, distinct objects. It's up to the programmer to
> design a node data type which can capture the necessary information
> to distinguish between nodes. Obviously what I'm saying is obvious,
> so I think it's my turn to say "I'm missing your point", and ask for
> clarification.
...

I don't know who's missing what, but I could have been more clear
before and will attempt to be so now. I'll also provide a somewhat
realistic example, which is what you originally asked for (in message
#594).

Consider an application used by a school for keeping track of students,
courses, teachers, rooms, books, meals, buses and other objects that
are part of the school.

The application uses a Student class:

    class Student {
        int id;
        string name;
        string address;
        Date birthdate;
        ...
    };

The Student class has a lot of data members, and there are a lot of
(member and free) functions for handling students. Indeed, there is
an extensive graphical user interface that allows the user to add,
modify, delete and find students maintained by the application.

The application also uses a Course class:

    class Course {
        int number;
        string name;
        string description;
        string instructor;
        string room;
        string meeting_times;
        ...
    };

Like the Student class, the Course class has a lot of data members,
and there are a lot of (member and free) functions for handling
course. And there is an extensive graphical user interface that
allows the user to add, modify, delete and find courses maintained
by the application.

What I want you to imagine is that there are two independent and
well-developed classes that do not yet have an association between
them. One might even imagine -- as is actually realistic -- that
the two classes are parts of different applications written by
different programmers.

Of course, requirements change and software evolves. At some point,
there is a requirement to associate students and courses: to keep
track of which students are enrolled in which classes. Obviously,
this is a many-to-many association.

As I understand it, in order to use the BGL -- indeed to use any graph
library with which I am familiar -- one would have to derive both the
Student and Course classes from some common base class:

    class SchoolObject { ... };

    class Student : public SchoolObject { ... };

    class Course : public SchoolObject { ... };

Then one could create a graph for which each node is a SchoolObject.

While this approach might work, it is unnatural. A Student is not a
Course, and there really is no such concept as a "SchoolObject" in the
application domain. Indeed, such an approach would allow links
(graph edges) between students and students, or between courses and
courses -- neither of which makes sense. Thus there would need to be
source code to explicitly forbid such illogical links.

There are two main problems with using the BGL (or any general-purpose
graph library with which I am familiar).

First, a mathematically idealized graph generally does not distinguish
between nodes of different types: Any node can be linked to any other
node via an edge. The same can be said of computer implementations of
graphs. In brief, a graph is typically an association between a type
(class) and itself.

Yes, mathematically, one can consider bipartite graphs in which nodes
can be classified into different classes. But the graph libraries I
have seen do not implement this idea as a graph involving two
different (C++) classes of nodes.

Second, the graph libraries I am familiar with emphasize abstract nodes
linked by edges. The nodes and edges may each have data associated with
them (the BGL does this using property maps), but the data associated
with nodes and edges is considered secondary. In effect, the idea of a
graph makes the association primary and the participating class(es) of
nodes (objects) secondary.

But in typical applications -- as I try to indicate in my example --
the emphasis is the object. The classes of objects (students and
courses) are primary and are predetermined. One does not have the
flexibility to recode these classes to conform to, say, the BGL, the
Graph Template Library (GTL) or some other graph library. The
association between the classes is a relatively minor detail, and
associations can be added, modified or removed from an application
perhaps more frequently than the classes of objects themselves.

I am looking for a library that allows me to take the preexisting
Student and Course classes and to add an Enrollment association between
the two with but minimal modification of existing source code.

> Are you saying it would be useful to have a bipartite graph (or
> N-partite graph) with each part holding separate types? Is that how
> Tilevich's association library worked? I can't recall.

Yes, as I note above, I am saying it would be useful to have a
bipartite graph, i.e., a graph with nodes in two different classes.

Yes, this is how Tilevich does it in the Sept. 2001 C/C++ Users Journal
article. He gives an example involving employees and projects, which is
conceptually the same as my example involving students and courses:

    ManyMany<Employee*, Project*> Projects;

> I dare say the BGL (or any C++ based graph) could handle nodes with
> different data types via inheritance (all nodes contain a pointer to
> base data class).

Yes, I think it could. But as I note above, it is unnatural to derive
fundamentally different classes from some artificial base class.

Moreover, in so doing, you allow for unnatural edges linking, say, a
student to a student rather than to a course.

> Regardless, the issue of classes is one of syntax. My statement was
> meant semantically, without regard to syntax.
>
> 1. Do you agree with the statement "Any implementation of an
> many-many assoication container must have an underlying graph
> lurking within (semantically speaking)"?

Yes, I agree, if one allows that the underlying graph may be a
bipartite graph that has nodes in two different classes.

> 2. Do you agree with the statement "The BGL is capable of being the
> underlying mechanism for implementing a many-many class (e.g., one
> with the I/F defined by Tilevich's interface) ?
>
> I'm fairly certain about 1, and think 2 is correct, but am not
> certain especially since I don't have Tilevich's class at hand
> right now.

Yes, it is capable of being the underlying mechanism.

But it is also true that it is possible to write an object-oriented
program with inheritance, polymorphism and virtual functions in
plain C. It is possible, but it is not very natural or convenient.

The point of the STL -- and I believe of Boost extensions of the STL
-- is to make common but involved tasks *easy* by providing templates
that are efficient and easy to use.

> The 3rd question is, if it can be implemented with BGL, is it worth
> it? You'd be able to reuse the exchangeability of data structures.
> You'd have a host of algorithms available. Anything new you
> developed might be reused for graphs in general. You'd have more
> time to concentrate on the I/F.

This question gets at my point about graphs emphasizing the
association, i.e., the edges, whereas most C++ application classes
that participate in associations are emphasized more than than the
associations between them.

To put it a different way, much of the functionality that a graph
library provides -- beyond being a mechanism for implementing the
association -- is not desired.

Would one do a depth-first or breadth-first search of the graph
students and courses? What would that mean and what use would it be?

Would one want to find connected components, i.e., clusters of students
and courses that form a kind of school-within-a-school? One could do
that, but it would not be high on the list of requirements for a school
management application.

What is desired is being able to access "neighbors": all the courses a
particular student is enrolled in or all the students enrolled in a
particular course. But oned oes *not* need an entire graph library to
achieve this low-level functionality. One just needs an association
library.

And that is what I seek: An efficient, flexible, easy-to-use
association library that allows me to add, modify and delete different
kinds of associations (1-to-1, 1-to-many, many-to-many, unidirectional,
bidirectional, etc.) between classes A, B, C, etc., in which some
associations are between a class and itself (hence a general graph)
but other associations are between distinct classes.

Craig, thank you for your good questions and for giving me the
opportunity to think out loud.

Thanks,
Rob Zako

P.S. So far we have been discussing only *binary* associations. If we
were to discuss ternary and higher-order associations, which link 3 or
more objects together, then the idea of an association as a graph would
break down.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net