Boost logo

Boost :

From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-05-12 17:09:21


Joachim Faulhaber wrote:
>...
>interval_set:
>itl::interval_set implements all functionality of std::set as a set
>of intervals and in addition +=, -= and *= for set union, difference
>and intersection. On insertion of neighbouring or overlapping
>intervals inserted intervals are joined.
>
>split_interval_set:
>split_interval_set works like interval_set but borders of the
>inserted intervals are preserved, so it can be used as time grid.
>Using itl::split_interval_set you can partition other interval
>containers using the *= operator to perform intersections.
>
>split_interval_map:
>A split_interval_map implements a map as a map of intervals
>and associated values. For dealing with the insertion
>of overlapping intervals the principle of "aggregation on
>overlap" has proved to be very useful: It is shown by example:
>
>typedef set<string> guests;
>split_interval_map<time, guests> party;
>guests mary; mary.insert("Mary");
>guests harry; harry.insert("Harry");
>party.insert(make_pair(rightopen_interval(20:00, 22:00), mary));
>party.insert(make_pair(rightopen_interval(21:00, 23:00), harry));
>// party now contains
>[20:00, 21:00)->{"Mary"}
>[21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap
>[22:00, 23:00)->{"Harry"}
>
>As can be seen from the example a split_interval_map has both
>a decompositional behavior (on the time dimension) as well as
>a accumulative one (on the associated values).
>
>Quality, validation, portability:
>
>The library code of the itl has been validated with a law based
>test automaton (LaBatea;) which is included in
>the current release. It includes a testcase generator and law tester
>that checks laws like e.g. commutativity<interval_set<double>, +=>:
>the commutativity of interval_set<double> w.r.t. the operation +=.
>
>The code has been compiled and tested with three compilers
>(gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three
>implementations of the stl (linux/gnu, stlport, msvc8) under
>suse linux 10.1. and ms-windows xp.
>
>I'd be happy if you found interval conainers as presented in this
>post an interesting contribution to look at.
>
>If so, I am looking forward to an inspiring discussion.

I found this in your implementation:
    /// Container type for the implementation
    typedef typename itl::set<interval_type,exclusive_less,Alloc>
ImplSetT;
as the underlying data structure for the base type of your interval maps
and sets.

This is a coincidence, because I happen to have implemented the same
underlying split interval map data structure for use as the scanline
data structure for my algorithm that computes the connectivity graph of
overlapping planar geometries. This data structure results in O(n) map
entries for O(n) overlapping intervals, where each entry has O(n)
elements in a set (in the worst case.) That turns out to be pretty
suboptimal data structure because it has a storage space complexity
O(n^2). A superior data structure for the same would be akin to a 1D
R*Tree, center-cutline implementation of a quad-tree or similar. Such a
data structure which would store only O(n) elements of constant size and
provide log(n) + m (where m is n in the worst case) amortized lookup
time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n
in the worst case.) Imagine everyone at your example party has
overlapping intervals and different arrival and departure times.
Looking up who was at the party while a given person was present would
require looking at each set of names for each sub-interval where the set
of people was different and results in reading the same name many times
because it appears in many adjacent sub intervals.
 
For my application the target use case is two layer connectivity
extraction, where the maximum number of polygons (people) in the
scanline (party) is bounded at two (pretty dead party) and the
complexity of the algorithm based on the split interval map type data
structure is optimal. For n overlapping polygons I know the algorithm
to be suboptimal, and I know what data structure to apply to fix it. I
would be more interested in the optimal data structure than a generic
wrapper around this usage of std containers. I'm not sure how having
your interval set code would have helped me write my connectivity
extraction algorithm using this data structure vs. what I was able to do
quite readily with the stl alone.

By the way, a map of sets has poor performance when rebalancing itself.
Getting rid of this data structure in my code is on my todo list. Once
I have a good replacement (and I have several quad-tree and R*Tree
implementations internally and in the open source community to choose
from) I would make it generic and provide it as part of my library (in
addition to using it where appropriate in my algorithms.) There is also
a generic RTree implementation that I have added to the library since
submitting it to the vault, but that hasn't been cleared for external
publication yet.

I suggest you research R*Trees and related data structure for spatial
query and think about how your problem resembles 1D spatial query.

Thanks,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk