Hi,
Our project has recently run into severe performance issue with remove_edge() function of Boost graph. In our particular case, we use adjacency list specialized as:
typedef adjacency_list<
listS,
vecS,
bidirectionalS,
CdfgNodeProperty,
CdfgEdgeProperty> CdfgType;
At some point in the flow, we need to remove great amount of edges from the graph, an amount comparable to the total number of edges in the graph. Unfortunately, since the complexity of removal of each edge is about O(E/V) the complexity of removal of all edges is about O(E * E/V). This appears to be too much, when the number of edges is big.
What is actually interesting about this situation, is that I see no way to remove all edges (let’s say we need to remove all edges) with any Boost function and with any adjacency list specialization in O(E) complexity, not necessarily the one we’re currently using. The O(E/V) complexity of the remove_edge() comes from the fact that there is a need to remove the edge from in/out edge lists of the vertices connected by the edge. Even though I know I will not need any of these lists after all edges are removed, remove_edge() doesn’t know that, so it is doing the “extra” work of updating the lists one-by-one. Moreover, in case of edge lists such as vector/list/multiset, i.e. containers that allow parallel edges in the graph, the search is linear in the size of containers (resulting in O(E/V) complexity of the operation). The question is what can be done, in any way, to do this “simple” operation of removing all edges of the graph in “more reasonable” complexity O(E) instead of O(E * E/V)?
It doesn’t help to try to remove many edges at once, such as using clear_in_edges() function to remove all in-edges of the given node, or clear_vertex() to remove all edges of the given node, because these functions still have to remove all the edges being removed from the edge list of the connected nodes. This results in the same complexity of the operation, as with using repetitive remove_edge() calls.
It also doesn’t help using other containers for storing in/out edge lists, such as list or multiset, instead of vector. Although the removal operation is faster in these containers, the search operation that looks for the edge to be removed is still in the order of the size of the container. For containers such as set or hash set, the search operation is faster, but such containers do not allow parallel edges, since the edges are indexed by the node identifier on the other side of the edge. The search operation that looks for the edge to be removed is in the order of the size of the container because it is necessary to find all the parallel edges. Therefore, for implementations allowing parallel edges, the implementation goes over all edges stored in the container and checks whether the edge connects to the node whose edges are being removed (in case of clear_in_edges()/clear_vertex()). If it is, the edge is removed from the in/out edge list. In our graph implementation, we need to support parallel edges. Even without parallel edges, the total complexity of removal of all edges still can’t be brought down to O(E) (only to O(E * log(E/V))).
If I would be free to change Boost’s graph implementation, one possible solution could be to store in each edge two iterators pointing to the position of this edge in the corresponding in/out edge lists. This would allow removal of the edge from “list” container in O(1). However, it doesn’t seem to be possible to do with current Boost implementation. Still, even if it would be possible, the memory consumption would increase due to the storage of the two iterators (at least two additional extra pointers per edge).
Another approach could be to index the edges in in/out edge lists by edge identifier (which is not a part of Boost graph infrastructure), instead of indexing them by node identifier. This would allow using “multiset” container to find the edges to remove faster (in logarithmic time). But this still doesn’t seem to be possible to do with current Boost implementation and it will also incur a memory overhead (due to usage of complex container as multiset). And still the complexity will be more than O(E).
Is there a way to remove all edges in O(E) complexity? Do I miss something, or there is an inherent performance issue in current Boost graph implementation with regard to the edge removal? Even if the current situation is by design due to memory consumption trade offs, I wish the graph implementation would be flexible enough to let the user control performance/memory balance, instead of requiring the performance hit. Or, at least, to provide an atomic function for removal of all edges of the graph, which can be easily implemented to work fast and without incurring any memory overhead with all underlying in/out edge container implementations (just clear all edge-related containers).
Any thoughts are welcome.
Thanks,
Yulik.