Subject: Re: [boost] [Graph] iterator range splitting
From: Tim Keitt (tkeitt_at_[hidden])
Date: 2008-12-15 12:21:54
On Sun, Dec 14, 2008 at 3:22 PM, Tim Keitt <tkeitt_at_[hidden]> wrote:
> On Sun, Dec 14, 2008 at 2:31 PM, David Abrahams <dave_at_[hidden]> wrote:
>> on Sun Dec 14 2008, "Tim Keitt" <tkeitt+list-AT-gmail.com> wrote:
>>> I've run into some problems with the Graph library as there is a lot
>>> of code that splits iterator ranges. An example of this is:
>>> (from depth_first_search.hpp)
>>> template <class VertexListGraph, class DFSVisitor, class ColorMap>
>>> depth_first_search(const VertexListGraph& g, DFSVisitor vis, ColorMap color)
>>> // should the below not be:
>>> // graph_traits<VertexListGraph>::iterator vi, vend;
>>> // tie(vi, vend) = vertices(g);
>>> // if ( vi == vend ) ... ???
>>> // or perhaps to be more terse
>>> // if ( empty_range(vertices(g)) )
>>> // where empty_range(g) compares first and second
>>> if (vertices(g).first == vertices(g).second)
>>> depth_first_search(g, vis, color, *vertices(g).first);
>>> This is causing problems in my code where I have defined custom graph
>>> classes. In my case
>>> out_edges(g) != out_edges(g)
>>> because memory is allocated within the out_edges call and successive
>>> calls will return different ranges (vertex ranges are not a problem in
>>> my case as I use a simple counting iterator). Must out_edges(g) ==
>>> out_edges(g) for a graph class to meet the Graph library requirements?
>> Hmm. Well, it doesn't look like that's a documented requirement, but
>> maybe it should be (actually, out_edges(u,g) == out_edges(u,g)). After
> right -- forgot the vertex descriptor
>> all, writing algorithms without making that assumption could be very
>> difficult. I can even imagine that one might need to store these
>> iterators (in a heap or something) and come back to them compare them
>> with the end iterator later, which would be awkward at best and
>> impossible at worst unless one could make that assumption.
> Its actually quite straightforward. All you need to do is store an
> iterator pair rather than an iterator. It was not hard to modify eg
> push_relabel_max_flow to store iterator pairs (currently it holds
> vectors of iterators that are incremented across several loops and if
> I recall across some function calls).
>>> If not, then a good bit of the code needs to be altered and possibly
>>> some of the iterator macros deprecated.
>>> Below I've located every instance where the first or second member of
>>> an iterator pair is thrown away. These cases are only problematic if
>>> the retained iterator is later compared to an iterator from another
>>> The fix is simple. One only needs to retain both members of the
>>> iterator pair. I am willing to provide a patch if it is agreed that
>>> the current code is wrong.
>> I'm not entirely against it, but I guess I'd need to be convinced that
>> it was really a worthwhile generalization.
>> Since your out_edges() calls are already heavyweight, why don't you do
>> something to make iterators from consecutive calls compare equal? For
>> example, you could store the source vertex descriptor and offset in
>> them. If that's at all practical, I would consider it a fairly good
>> argument for tightening the IncidenceGraph requirements a little bit so
>> the assumption was valid.
> What I am trying to achieve is to not have the entire graph in memory.
> Currently I use shared_container_iterator to iterate over out edges so
> they are deallocated when there are no more references. I can easily
> change that by holding onto a reference, but then I might as well copy
> everything into an adjacency_list since pretty much any use of the
> IncidenceGraph concept will call out_edges on every vertex.
> Storing an offset into the array is possible, but awkward as every
> call to out_edges issues a SQL query and allocates memory, so I would
> incur that overhead simply to retrieve the out degree (which would
> then be used to compare to the current value of the offset).
Thinking about this more, it seems that if you are going to return
iterator pairs, then it makes sense to assume they are unique per call
and you should not compare iterators from different ranges (ie
subsequent calls). Otherwise, I would suggest that calls like
out_edges be split into out_edges_begin and out_edges_end because you
may want very different things to happen when retrieving the beginning
of a range and the end of a range.
I will probably revert my code to simply copying the entire graph into
an adjacency_list. Its a time-space trade-off and for many of the
algorithms run-time will trump space efficiency anyway. I liked the
idea of graph classes that use a database for storage, but it seems
impractical with the current Graph library.
>> Dave Abrahams
>> BoostPro Computing
>> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
> Timothy H. Keitt
-- Timothy H. Keitt http://www.keittlab.org/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk