On Fri, Sep 27, 2013 at 9:54 PM, Jeremiah Willcock <jewillco@crest.iu.edu> wrote:
On Fri, 27 Sep 2013, Dominique Devienne wrote:
For my own education, why "singleton" vertices and duplicate edges make topological_sort break?

What behavior do you get from them?  What is broken in those two cases

If I comment out this code:

    // Boost.Graph does not like duplicate edges, so remove them.
    std::sort(edges.begin(), edges.end());
    std::vector<IndexPair>::iterator new_edges_end =
        std::unique(edges.begin(), edges.end());
    edges.erase(new_edges_end, edges.end());

then I get

Running 46 test cases...
unknown location(0): fatal error in "test_get_schema_tables_dup_edges_bug": C:\Program Files (x86)\Microsoft Visual Studio 10
.0\VC\include\vector(932) : Assertion failed: vector subscript out of range

..\..\..\..\..\..\..\src\tests\core\utils\Relational\main.cpp(2105): last checkpoint
../../../../../../../src/lib/pdgm/core/utils/Relational.cpp(3243): fatal error in "test_deterministic": Assertion failed (0 <
= vrtx_index && vrtx_index < count), in function class std::vector<class pdgm::core::relational::Table,class std::allocator<c
lass pdgm::core::relational::Table> > __cdecl pdgm::core::relational::PhysicalSchema::tables(bool) const

*** 2 failures detected in test suite "Master Test Suite"
*** failure detail:
(1)     : pdgm_core_utils_RelationalTests/test_get_schema_tables_dup_edges_bug failed. C:\Program Files (x86)\Microsoft Visua
l Studio 10.0\VC\include\vector(932) : Assertion failed: vector subscript out of range
(2)     : pdgm_core_utils_RelationalTests/test_deterministic failed.


If I comment out the if (info.isGraphVertex()) test in the code below (and do the code inside unconditionally):

    // Boost.Graph does not like "holes" (missing vertex indices),
    // so assign vertex indices only to tables participating in graph to sort,
    // and recording mapping back to TopoSortInfo given the vertex index.
    std::vector<TopoSortInfo*> vrtxIndex2Info;
    BOOST_REVERSE_FOREACH (TopoSortInfoMap::value_type& entry, infoMap) {
        TopoSortInfo& info = entry.second;
        if (info.isGraphVertex()) {
            info.vertex_index_ = vrtxIndex2Info.size();
            vrtxIndex2Info.push_back(&info);
        }
    }
    const size_t vrtx_count = vrtxIndex2Info.size();

with 

        bool isGraphVertex() const { return indegree_ + outdegree_ > 0; }

I get plenty of failures (28). Many of them is simply the case where all tables are standalone, so I end up with no edges... Since edges is the only input to topo-sort, it cannot return those nodes I now realize. Which forces me to preprocess the input to topo-sort.
 
I also had to 

            // ignore self-referential FKs, which do not influence topo-sort,
            // (and seem to confuse Boost-Graph too...)
            if (cnstr.isFK() && (cnstr.parentTableName() != table.name())) {

Since the graph would not be a DAG then.

In the end, by using topological_sort(), it almost feel like I've written more code to fit the input it needs, than if I had used that simple algo that iterates on 0-outdegree nodes (the roots of the graph), remove those as dependencies to other nodes (and update their indegree/outdegree) and goes on until all nodes are consumed (or there's a cycle).


But what brings me here today is to ask about a "stable" topo-sort. When I have several vertices which are all
"unrelated" and thus sort at the same "level", topological_sort does not preserve the relative order of those
"sibling" vertices, and I'd like something fully deterministic that I can count on.

It is probably deterministic (looking at the code), assuming your graph data structure preserves edge order.

Yes it does.
 
What are my options? Does BGL provide something like this?

As a concrete example, given tables A, B, C, with A depending on both B, C, topological_sort gives me the C, B, A
order (or adding the "levels" (C, B), A), while I'd like B, C, A, i.e. preserving the relative order of B, C.

Is the order of outgoing edges always backwards from what you want (I suspect it is)?  If so, reversing the order when putting the edges into the graph would be an easy fix to the issue.

It appears that way. So I've simply switched some of my loops to using BOOST_REVERSE_FOREACH to have them alpha-sorted. Thanks.
 
Thanks for any insight. --DD

PS: What about getting a topo-sort, but which also gives you the "levels", so that you can schedule those
independent tasks (on each level) in parallel for example? Does BGL provide something like this? I've done it in the
past with the help of the Sedgewick book, but I see a single topo-sort in BGL.

The right way to do that, most likely, is to use a proper multi-processor scheduling algorithm (not straight topological sort, which only works ideally for one processor) to assign tables to processors.  Work-stealing is likely to behave well, also, is simple, and can handle the case where task times are unpredictable.

The algo I describe above "feels" like work-stealing in a way, but I'm no expert.

Thank you for your input. I don't think there's anything wrong with topological_sort(), it's just that it doesn't fit my use case that much after all, that's all.

Thanks, --DD