Boost logo

Boost Users :

Subject: Re: [Boost-users] [BGL] Threshold for betweenness_centrality_clustering()
From: Cedric Laczny (cedric.laczny_at_[hidden])
Date: 2011-01-20 14:22:05


On Thursday, 20. January 2011 19:36:05 Jeremiah Willcock wrote:
> On Thu, 20 Jan 2011, Cedric Laczny wrote:
> > Hi,
> >
> > I would like to use boost::betweenness_centrality_clustering() to search
> > a graph for specific clusters. The documentation talks about a "Done"
> > object, which I assume is a functor, similar to the predicates used in
> > filtered_graph().
> > So in order to decide whether the algorithm should stop the clustering,
> > it probably needs a threshold to decide if it should further iterate and
> > remove edges to decompose the graph.
> >
> > My questions now are:
> > - Does someone perhaps have an example for the usage of this algorithm? I
> > only found documentation on "betweenness.centrality.clustering {RBGL}"
> > for Gnu R, which doesn't bring me very far.
> > - Is there some sort of rule-of-thumb to start with for finding this
> > termination threshold or if there is some more-or-less methodical way of
> > finding out a reasonable threshold?
>
> I do not know too much about the algorithm myself, but would the
> documentation at
> <URL:http://www.boost.org/doc/libs/1_45_0/libs/graph/doc/bc_clustering.html
> > help you?

Nice idea, but unfortunately not. I thoroughly read this, but besides the
general characteristics of this calculation, there was no "rule-of-thumb" to
be found. I came up with the idea of simply letting it run once one my network
and then get the "highest" results to see what they would look like. Of course
this is not very agile, error-prone (as I expereienced myself) and highly
static. But that's probably simply a general problem of these "supervised"
approaches.

However, I have come up with some difficulties in working out how to use the
algorithm.
First, the implementation seems to need an edge_index as long as no external
EdgeCentralityMap is provided. This should be definitely a thing to include in
the documentation, especially as, for clustering, this map is not mandatory,
oppositely to brandes_betweenness_centrality(). This is indeed problematic if
the majority of graphs are based on unique edges, thus setS. I don't know if
people more often use vecS or other containers.
In this context I would like to provide the solution I used to create the
EdgeCentralityMap when using an OutEdgeList different from vecS:

#include <boost/graph/iteration_macros.hpp>

    typedef std::map<edge_descriptor, int> MyEdgeIndexMap;
    MyEdgeIndexMap my_edge_index;
    typedef associative_property_map< MyEdgeIndexMap > EdgeIndexMap;
    EdgeIndexMap edge_index(my_edge_index);

    int i = 0;
    BGL_FORALL_EDGES(edge, tmp_g, graph_type) {
        my_edge_index.insert(std::pair< edge_descriptor, int >( edge, i));
        ++i;
    }

    // Define EdgeCentralityMap
    std::vector< double > e_centrality_vec(num_edges(tmp_g), 0.0);
    // Create the external property map
    iterator_property_map< std::vector< double >::iterator, EdgeIndexMap >
        e_centrality_map(e_centrality_vec.begin(), edge_index);

Maybe there are simpler solutions.
Second, it was really confusing that the EdgeCentralityMap is based on
absolute centrality-values, while the termination-criterion ("Done") defaults
to relative by normalising the edge-centrality values. Especially for large
graphs, it can be quite frustrating to realize that the chosen threshold was >
1 and thus "per se" bigger and the algorithm will directly fail, but not after
having computed the betweenness-centrality one time. One might then also
wonder why it took its time but the resulting graph was the same as the
original. But I think this is rather a problem in understanding and might be
clarified by updating the documentation.

To summarize my experience with the bc-clustering, I was actually a little bit
disappointed. This might very well be as this clustering technique is not
directly suitable for my needs, also because my graphs are not really small
(20k vertices, 300k edges). It would be nice to see more clustering techniques
directly in the BGL. I do know that implementations of these techniques are
probably not the easiest tasks and probably the need is also rather low, as
otherwise they maybe would have already been integrated into BGL.

Nevertheless, thank you for your help.

> -- Jeremiah Willcock

Best,

Cedric


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net