Boost logo

Boost :

Subject: Re: [boost] BGL CSR Format (logarithmic edge search)
From: Ryan Lichtenwalter (rlichten_at_[hidden])
Date: 2011-11-21 16:11:36


> Hello,
>
> According to:
>
>
http://www.boost.org/doc/libs/1_48_0/libs/graph/doc/compressed_sparse_row.html#edge-access
>
> [Edge search] requires linear time in the number of edges outgoing from u.
>
> The purpose of the CSR format is to support highly efficient computation
> with minimal memory requirements at the cost of mutability and additional
> time in construction. Is there a reason, then, that the adjacencies are
> stored unsorted within their vertex ranges? With minimal additional time
> during construction, edge access could be reduced to logarithmic time in
> the number of edges outgoing from u without sacrificing any functionality
> elsewhere that I can determine.
>
> Can somebody explain this design decision to me? If there is no
> explanation, does anybody have plans to adjust the representation? I would
> probably be willing to contribute an adjusted representation if nobody
else
> has plans to do so.

The edges from each vertex used to be sorted, but that was changed when
linear-time graph construction was introduced. One problem is that the
standard sorting routines cannot sort corresponding ranges of elements
(such as edge targets and properties); also, sorting edges increases the
time complexity of the graph construction algorithm. Note that the CSR
constructors that do sorting, except for the in-place versions, apply
stable sorts to the edges you input, so you could give it a sorted list of
edges and then write a logarithmic-time search routine using out_edges().

-- Jeremiah Willcock

Hi,

Thanks for your response. I'm sorry to hear about the move away from sorted
adjacencies.

I'm not sure if this is open for discussion, but I'd like to offer my two
cents.

The construction cost (of sorting) is minimal, especially in sparse
networks for which the compressed sparse row format is specifically
targeted where it is essentially asymptotically constant-time unless the
degree distribution is highly skewed.

On the other hand, certain aspects of link prediction, subgraph counting,
and any algorithm that requires neighbor set intersections will benefit
greatly from the ordering.

I actually rolled exactly the solution you suggested for my own needs.
Thank you so much.

Ryan Lichtenwalter


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk