Boost logo

Boost Users :

Subject: Re: [Boost-users] [Graph] Stable topological_sort? And topological_sort woes in general.
From: Dominique Devienne (ddevienne_at_[hidden])
Date: 2013-09-30 08:41:41


On Fri, Sep 27, 2013 at 9:54 PM, Jeremiah Willcock <jewillco_at_[hidden]>wrote:

> On Fri, 27 Sep 2013, Dominique Devienne wrote:
>
>> For my own education, why "singleton" vertices and duplicate edges
>> make topological_sort break?
>>
>
> What behavior do you get from them? What is broken in those two cases

If I comment out this code:

    // Boost.Graph does not like duplicate edges, so remove them.
    std::sort(edges.begin(), edges.end());
    std::vector<IndexPair>::iterator new_edges_end =
        std::unique(edges.begin(), edges.end());
    edges.erase(new_edges_end, edges.end());

then I get

Running 46 test cases...
unknown location(0): fatal error in "test_get_schema_tables_dup_edges_bug":
C:\Program Files (x86)\Microsoft Visual Studio 10
.0\VC\include\vector(932) : Assertion failed: vector subscript out of range

..\..\..\..\..\..\..\src\tests\core\utils\Relational\main.cpp(2105): last
checkpoint
../../../../../../../src/lib/pdgm/core/utils/Relational.cpp(3243): fatal
error in "test_deterministic": Assertion failed (0 <
= vrtx_index && vrtx_index < count), in function class std::vector<class
pdgm::core::relational::Table,class std::allocator<c
lass pdgm::core::relational::Table> > __cdecl
pdgm::core::relational::PhysicalSchema::tables(bool) const

*** 2 failures detected in test suite "Master Test Suite"
*** failure detail:
(1) :
pdgm_core_utils_RelationalTests/test_get_schema_tables_dup_edges_bug
failed. C:\Program Files (x86)\Microsoft Visua
l Studio 10.0\VC\include\vector(932) : Assertion failed: vector subscript
out of range
(2) : pdgm_core_utils_RelationalTests/test_deterministic failed.

If I comment out the if (info.isGraphVertex()) test in the code below (and
do the code inside unconditionally):

    // Boost.Graph does not like "holes" (missing vertex indices),
    // so assign vertex indices only to tables participating in graph to
sort,
    // and recording mapping back to TopoSortInfo given the vertex index.
    std::vector<TopoSortInfo*> vrtxIndex2Info;
    BOOST_REVERSE_FOREACH (TopoSortInfoMap::value_type& entry, infoMap) {
        TopoSortInfo& info = entry.second;
        if (info.isGraphVertex()) {
            info.vertex_index_ = vrtxIndex2Info.size();
            vrtxIndex2Info.push_back(&info);
        }
    }
    const size_t vrtx_count = vrtxIndex2Info.size();

with

        bool isGraphVertex() const { return indegree_ + outdegree_ > 0; }

I get plenty of failures (28). Many of them is simply the case where all
tables are standalone, so I end up with no edges... Since edges is the only
input to topo-sort, it cannot return those nodes I now realize. Which
forces me to preprocess the input to topo-sort.

I also had to

            // ignore self-referential FKs, which do not influence
topo-sort,
            // (and seem to confuse Boost-Graph too...)
            if (cnstr.isFK() && (cnstr.parentTableName() != table.name())) {

Since the graph would not be a DAG then.

In the end, by using topological_sort(), it almost feel like I've written
more code to fit the input it needs, than if I had used that simple algo
that iterates on 0-outdegree nodes (the roots of the graph), remove those
as dependencies to other nodes (and update their indegree/outdegree) and
goes on until all nodes are consumed (or there's a cycle).

> But what brings me here today is to ask about a "stable" topo-sort. When
>> I have several vertices which are all
>> "unrelated" and thus sort at the same "level", topological_sort does not
>> preserve the relative order of those
>> "sibling" vertices, and I'd like something fully deterministic that I can
>> count on.
>>
>
> It is probably deterministic (looking at the code), assuming your graph
> data structure preserves edge order.

Yes it does.

> What are my options? Does BGL provide something like this?
>>
>> As a concrete example, given tables A, B, C, with A depending on both B,
>> C, topological_sort gives me the C, B, A
>> order (or adding the "levels" (C, B), A), while I'd like B, C, A, i.e.
>> preserving the relative order of B, C.
>>
>
> Is the order of outgoing edges always backwards from what you want (I
> suspect it is)? If so, reversing the order when putting the edges into the
> graph would be an easy fix to the issue.

It appears that way. So I've simply switched some of my loops to
using BOOST_REVERSE_FOREACH to have them alpha-sorted. Thanks.

> Thanks for any insight. --DD
>>
>> PS: What about getting a topo-sort, but which also gives you the
>> "levels", so that you can schedule those
>> independent tasks (on each level) in parallel for example? Does BGL
>> provide something like this? I've done it in the
>> past with the help of the Sedgewick book, but I see a single topo-sort in
>> BGL.
>>
>
> The right way to do that, most likely, is to use a proper multi-processor
> scheduling algorithm (not straight topological sort, which only works
> ideally for one processor) to assign tables to processors. Work-stealing
> is likely to behave well, also, is simple, and can handle the case where
> task times are unpredictable.

The algo I describe above "feels" like work-stealing in a way, but I'm no
expert.

Thank you for your input. I don't think there's anything wrong
with topological_sort(), it's just that it doesn't fit my use case that
much after all, that's all.

Thanks, --DD



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net