|
Boost Users : |
From: Feldman, Yulik (yulik.feldman_at_[hidden])
Date: 2005-07-19 04:14:06
Hi,
Our project has recently run into severe performance issue with
remove_edge() function of Boost graph. In our particular case, we use
adjacency list specialized as:
typedef adjacency_list<
listS,
vecS,
bidirectionalS,
CdfgNodeProperty,
CdfgEdgeProperty> CdfgType;
At some point in the flow, we need to remove great amount of edges from
the graph, an amount comparable to the total number of edges in the
graph. Unfortunately, since the complexity of removal of each edge is
about O(E/V) the complexity of removal of all edges is about O(E * E/V).
This appears to be too much, when the number of edges is big.
What is actually interesting about this situation, is that I see no way
to remove all edges (let's say we need to remove all edges) with any
Boost function and with any adjacency list specialization in O(E)
complexity, not necessarily the one we're currently using. The O(E/V)
complexity of the remove_edge() comes from the fact that there is a need
to remove the edge from in/out edge lists of the vertices connected by
the edge. Even though I know I will not need any of these lists after
all edges are removed, remove_edge() doesn't know that, so it is doing
the "extra" work of updating the lists one-by-one. Moreover, in case of
edge lists such as vector/list/multiset, i.e. containers that allow
parallel edges in the graph, the search is linear in the size of
containers (resulting in O(E/V) complexity of the operation). The
question is what can be done, in any way, to do this "simple" operation
of removing all edges of the graph in "more reasonable" complexity O(E)
instead of O(E * E/V)?
It doesn't help to try to remove many edges at once, such as using
clear_in_edges() function to remove all in-edges of the given node, or
clear_vertex() to remove all edges of the given node, because these
functions still have to remove all the edges being removed from the edge
list of the connected nodes. This results in the same complexity of the
operation, as with using repetitive remove_edge() calls.
It also doesn't help using other containers for storing in/out edge
lists, such as list or multiset, instead of vector. Although the removal
operation is faster in these containers, the search operation that looks
for the edge to be removed is still in the order of the size of the
container. For containers such as set or hash set, the search operation
is faster, but such containers do not allow parallel edges, since the
edges are indexed by the node identifier on the other side of the edge.
The search operation that looks for the edge to be removed is in the
order of the size of the container because it is necessary to find all
the parallel edges. Therefore, for implementations allowing parallel
edges, the implementation goes over all edges stored in the container
and checks whether the edge connects to the node whose edges are being
removed (in case of clear_in_edges()/clear_vertex()). If it is, the edge
is removed from the in/out edge list. In our graph implementation, we
need to support parallel edges. Even without parallel edges, the total
complexity of removal of all edges still can't be brought down to O(E)
(only to O(E * log(E/V))).
If I would be free to change Boost's graph implementation, one possible
solution could be to store in each edge two iterators pointing to the
position of this edge in the corresponding in/out edge lists. This would
allow removal of the edge from "list" container in O(1). However, it
doesn't seem to be possible to do with current Boost implementation.
Still, even if it would be possible, the memory consumption would
increase due to the storage of the two iterators (at least two
additional extra pointers per edge).
Another approach could be to index the edges in in/out edge lists by
edge identifier (which is not a part of Boost graph infrastructure),
instead of indexing them by node identifier. This would allow using
"multiset" container to find the edges to remove faster (in logarithmic
time). But this still doesn't seem to be possible to do with current
Boost implementation and it will also incur a memory overhead (due to
usage of complex container as multiset). And still the complexity will
be more than O(E).
Is there a way to remove all edges in O(E) complexity? Do I miss
something, or there is an inherent performance issue in current Boost
graph implementation with regard to the edge removal? Even if the
current situation is by design due to memory consumption trade offs, I
wish the graph implementation would be flexible enough to let the user
control performance/memory balance, instead of requiring the performance
hit. Or, at least, to provide an atomic function for removal of all
edges of the graph, which can be easily implemented to work fast and
without incurring any memory overhead with all underlying in/out edge
container implementations (just clear all edge-related containers).
Any thoughts are welcome.
Thanks,
Yulik.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net