Boost logo

Boost :

From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2008-05-12 21:20:18


2008/5/12, Simonson, Lucanus J <lucanus.j.simonson_at_[hidden]>:
> Joachim Faulhaber wrote:
> >...
> >interval_set:
> >itl::interval_set implements all functionality of std::set as a set
> >of intervals and in addition +=, -= and *= for set union, difference
> >and intersection. On insertion of neighbouring or overlapping
> >intervals inserted intervals are joined.
> >
> >split_interval_set:
> >split_interval_set works like interval_set but borders of the
> >inserted intervals are preserved, so it can be used as time grid.
> >Using itl::split_interval_set you can partition other interval
> >containers using the *= operator to perform intersections.
> >
> >split_interval_map:
> >A split_interval_map implements a map as a map of intervals
> >and associated values. For dealing with the insertion
> >of overlapping intervals the principle of "aggregation on
> >overlap" has proved to be very useful: It is shown by example:
> >
> >typedef set<string> guests;
> >split_interval_map<time, guests> party;
> >guests mary; mary.insert("Mary");
> >guests harry; harry.insert("Harry");
> >party.insert(make_pair(rightopen_interval(20:00, 22:00), mary));
> >party.insert(make_pair(rightopen_interval(21:00, 23:00), harry));
> >// party now contains
> >[20:00, 21:00)->{"Mary"}
> >[21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap
> >[22:00, 23:00)->{"Harry"}
> >
> >As can be seen from the example a split_interval_map has both
> >a decompositional behavior (on the time dimension) as well as
> >a accumulative one (on the associated values).
> >
> >Quality, validation, portability:
> >
> >The library code of the itl has been validated with a law based
> >test automaton (LaBatea;) which is included in
> >the current release. It includes a testcase generator and law tester
> >that checks laws like e.g. commutativity<interval_set<double>, +=>:
> >the commutativity of interval_set<double> w.r.t. the operation +=.
> >
> >The code has been compiled and tested with three compilers
> >(gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three
> >implementations of the stl (linux/gnu, stlport, msvc8) under
> >suse linux 10.1. and ms-windows xp.
> >
> >I'd be happy if you found interval conainers as presented in this
> >post an interesting contribution to look at.
> >
> >If so, I am looking forward to an inspiring discussion.
>

dear Luke,
thank you for looking at my code. Since this was my first
posting in boost it this is kind of adventurous for me.

> I found this in your implementation:
> /// Container type for the implementation
> typedef typename itl::set<interval_type,exclusive_less,Alloc>
> ImplSetT;
> as the underlying data structure for the base type of your interval maps
> and sets.
>
> This is a coincidence, because I happen to have implemented the same
> underlying split interval map data structure for use as the scanline
> data structure for my algorithm that computes the connectivity graph of
> overlapping planar geometries. This data structure results in O(n) map
> entries for O(n) overlapping intervals, where each entry has O(n)
> elements in a set (in the worst case.) That turns out to be pretty
> suboptimal data structure because it has a storage space complexity
> O(n^2).

Yes you are right. The implementation of my split_interval_maps is suboptimal.
That is where the 'boost' of boost comes in: It brings in broader views,
contexts and new requirements.

The problem domains in which my interval containers have been used so
far (processes related to patient management in hospitals) were all
characterized by small numbers of overlaps. Split_interval_maps
have been very useful to decompose different histories into a single
product history like in the 'history' example included in my library
code.

In many instances the associated split_interval_map<Time, set<T>>
were sets or other containers, but searching was not performed on
the sets. In other instances split_interval_map<Time, Quantity> were
used as a quantity on overlap aggregator. In all those problem domains
the worst case situation that you are addressing was not relevant.

For other problem domains the current implementation might not be
sufficient.

> A superior data structure for the same would be akin to a 1D
> R*Tree, center-cutline implementation of a quad-tree or similar. Such a
> data structure which would store only O(n) elements of constant size and
> provide log(n) + m (where m is n in the worst case) amortized lookup
> time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n
> in the worst case.) Imagine everyone at your example party has
> overlapping intervals and different arrival and departure times.
> Looking up who was at the party while a given person was present would
> require looking at each set of names for each sub-interval where the set
> of people was different and results in reading the same name many times
> because it appears in many adjacent sub intervals.
>
> For my application the target use case is two layer connectivity
> extraction, where the maximum number of polygons (people) in the
> scanline (party) is bounded at two (pretty dead party) and the
> complexity of the algorithm based on the split interval map type data
> structure is optimal. For n overlapping polygons I know the algorithm
> to be suboptimal, and I know what data structure to apply to fix it. I
> would be more interested in the optimal data structure than a generic
> wrapper around this usage of std containers. I'm not sure how having
> your interval set code would have helped me write my connectivity
> extraction algorithm using this data structure vs. what I was able to do
> quite readily with the stl alone.
>
> By the way, a map of sets has poor performance when rebalancing itself.

Yes, what I've been thinking about here is that every
map<T1, set<T2>> can be represented by an equivalent
set<pair<T1,T2>> or multi_map<T1,T2>. In other words
maps of sets can be flattened.

> Getting rid of this data structure in my code is on my todo list. Once
> I have a good replacement (and I have several quad-tree and R*Tree
> implementations internally and in the open source community to choose
> from) I would make it generic and provide it as part of my library (in
> addition to using it where appropriate in my algorithms.) There is also
> a generic RTree implementation that I have added to the library since
> submitting it to the vault, but that hasn't been cleared for external
> publication yet.
>
> I suggest you research R*Trees and related data structure for spatial
> query and think about how your problem resembles 1D spatial query.

I will follow this advice
>
> Thanks,
> Luke

Thanks, Joachim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk