Boost logo

Boost Users :

Subject: Re: [Boost-users] [BFL] Efficient neighbourhood search in trees
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2010-11-03 20:10:35


Jeremiah Willcock wrote:
> On Wed, 3 Nov 2010, Jeremiah Willcock wrote:
>
>> On Wed, 3 Nov 2010, Vincent Spruyt wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to calculate the average edge weight within a certain
>>> neighbourhood for each vertex in a tree.
>>> Basically, I have a (minimum spanning) tree and I want to remove
>>> all edges whose weight is much larger than the weight of edges
>>> closeby.
>>> The result would be a forrest of several connected components (I'm
>>> working on a clustering algorithm).
>>>
>>> Currently I naively iterate over each edge. For each edge, I find
>>> the source and target vertex. Then, for both the source and the
>>> target vertices, I calculate the average weight in a certain
>>> neighbourhood using a recursive function as following simplified
>>> pseudocode indicates:
>>>
>>>
>>> E: List of edges in MST
>>>
>>> getNeighbourhoodStats( depth, startvertex, edge ){
>>>
>>> out_edges = getIncedentEdges(startvertex)
>>>
>>> foreach out_edges as out
>>> if( out!=edge){
>>> totweight += getWeight(out);
>>>
>>> if(depth>0)
>>> getNeighbourhoodStats(depth-1, target(out), out)
>>>
>>> }
>>> end
>>> }
>>>
>>> main(){
>>> foreach E as e
>>>
>>> //Get the total weight on the left of the edge
>>> int totweight_left=0;
>>> getNeighbourhoodStats( 5, source( e), E, totweight_left )
>>>
>>> //Get the total weight on the right of the edge int
>>> totweight_right=0; getNeighbourhoodStats( 5, target( e), E,
>>> totweight_right ) end }
>>>
>>> However, this approach is rather inefficient as a lot of
>>> calculations are redundant (takes about 10ms for 300 edges which is
>>> way too slow for the purpose I had in mind). Any ideas for a better
>>> solution?
>>
>> I would suggest using dynamic programming. Keep a map from <vertex,
>> depth> to total weight. First, fill in <v, 0> with zero for all v,
>> then recursively fill in <v, n+1> with the sum of <w, n> for all
>> neighbors w of each v. This will not exclude the edge you're
>> currently testing from the sum, though.
>
> The way I suggested can also counting some edge weights multiple
> times, so the update formula may need to be <v, n+1> = sum(w in
> neighbors(v), <w, n>) - degree(v) * <v, n-1> or something like that.
> You might want to do some small examples by hand to make sure you get
> the formula right.

He said it was a tree so there shouldn't be multiple paths from any vertex to another vertex in its neighborhood. You should be able to accumulate total weight from the bottom up traversal of the topological order starting at the leaves. Then do the reverse topological order and sum the two. Don't count your own weight twice. Store total weight as a vector indexed by depth and shift of the last element then merge vectors of out edges and add your own weight at the beginning of the vector for each node as you traverse up the tree. Delete the vectors on the out edges as you go up to save memory. Even if the tree has no "root" it has leaves that we can start from, and no cycles, so it should be simple. However, my understanding of the problem at hand could be completely wrong. I'm guessing it is bidirectional and in and out have the same weight because they are really just one edge. Perhaps if we had a clear specification for what exactly the desired result of the clustering algorithm is and what exactly the input is we could solve the problem instead of solving the solution?

In any case it should be O(n * depth) where n is the number of nodes to cache all the total neighborhood weights on all the nodes. If you want to remove edges iteratively and update the weight stored on all the nodes in its neighborhood for each edge removed, which is as expensive as summing their weights, but it is O(edges removed) instead of O(edges) neighborhood traversals, which should at least be better. I can't see how the algorithm could be much worse than O(n^2) if you remove O(n) edges and have O(n) nodes and on average O(n) nodes within the neighborhood of any given node throughout. In normal usage I would expect << O(n) edges removed and << O(n) nodes in the neighborhood, but then I have no idea what the trees or depths that would be considered "normal" might look like. you'll probably have a priority queue of edges that are candidates for removal and want to remove and replace them as their nodes total weights get updated so you don't have to do a linear scan on edges at each iteration to find the next edge to remove, assuming that is what you have in mind.

Regards,
Luke


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net