Boost logo

Boost :

From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2008-05-13 12:40:48


2008/5/12, Simonson, Lucanus J <lucanus.j.simonson_at_[hidden]>:
> Joachim Faulhaber wrote:
> >...
> >interval_set:
> >itl::interval_set implements all functionality of std::set as a set
> >of intervals and in addition +=, -= and *= for set union, difference
> >and intersection. On insertion of neighbouring or overlapping
> >intervals inserted intervals are joined.
> >
> >split_interval_set:
> >split_interval_set works like interval_set but borders of the
> >inserted intervals are preserved, so it can be used as time grid.
> >Using itl::split_interval_set you can partition other interval
> >containers using the *= operator to perform intersections.
> >
> >split_interval_map:
> >A split_interval_map implements a map as a map of intervals
> >and associated values. For dealing with the insertion
> >of overlapping intervals the principle of "aggregation on
> >overlap" has proved to be very useful: It is shown by example:
> >
> >typedef set<string> guests;
> >split_interval_map<time, guests> party;
> >guests mary; mary.insert("Mary");
> >guests harry; harry.insert("Harry");
> >party.insert(make_pair(rightopen_interval(20:00, 22:00), mary));
> >party.insert(make_pair(rightopen_interval(21:00, 23:00), harry));
> >// party now contains
> >[20:00, 21:00)->{"Mary"}
> >[21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap
> >[22:00, 23:00)->{"Harry"}
> >
> >As can be seen from the example a split_interval_map has both
> >a decompositional behavior (on the time dimension) as well as
> >a accumulative one (on the associated values).
> >
> >Quality, validation, portability:
> >
> >The library code of the itl has been validated with a law based
> >test automaton (LaBatea;) which is included in
> >the current release. It includes a testcase generator and law tester
> >that checks laws like e.g. commutativity<interval_set<double>, +=>:
> >the commutativity of interval_set<double> w.r.t. the operation +=.
> >
> >The code has been compiled and tested with three compilers
> >(gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three
> >implementations of the stl (linux/gnu, stlport, msvc8) under
> >suse linux 10.1. and ms-windows xp.
> >
> >I'd be happy if you found interval conainers as presented in this
> >post an interesting contribution to look at.
> >
> >If so, I am looking forward to an inspiring discussion.
>
> I found this in your implementation:
> /// Container type for the implementation
> typedef typename itl::set<interval_type,exclusive_less,Alloc>
> ImplSetT;
> as the underlying data structure for the base type of your interval maps
> and sets.
>
> This is a coincidence, because I happen to have implemented the same
> underlying split interval map data structure for use as the scanline
> data structure for my algorithm that computes the connectivity graph of
> overlapping planar geometries. This data structure results in O(n) map
> entries for O(n) overlapping intervals, where each entry has O(n)
> elements in a set (in the worst case.) That turns out to be pretty
> suboptimal data structure because it has a storage space complexity
> O(n^2). A superior data structure for the same would be akin to a 1D
> R*Tree, center-cutline implementation of a quad-tree or similar. Such a
> data structure which would store only O(n) elements of constant size and
> provide log(n) + m (where m is n in the worst case) amortized lookup
> time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n
> in the worst case.) Imagine everyone at your example party has
> overlapping intervals and different arrival and departure times.
> Looking up who was at the party while a given person was present would
> require looking at each set of names for each sub-interval where the set
> of people was different and results in reading the same name many times
> because it appears in many adjacent sub intervals.
>
> For my application the target use case is two layer connectivity
> extraction, where the maximum number of polygons (people) in the
> scanline (party) is bounded at two (pretty dead party) and the
> complexity of the algorithm based on the split interval map type data
> structure is optimal. For n overlapping polygons I know the algorithm
> to be suboptimal, and I know what data structure to apply to fix it. I
> would be more interested in the optimal data structure than a generic
> wrapper around this usage of std containers. I'm not sure how having
> your interval set code would have helped me write my connectivity
> extraction algorithm using this data structure vs. what I was able to do
> quite readily with the stl alone.
>
> By the way, a map of sets has poor performance when rebalancing itself.
> Getting rid of this data structure in my code is on my todo list. Once
> I have a good replacement (and I have several quad-tree and R*Tree
> implementations internally and in the open source community to choose
> from) I would make it generic and provide it as part of my library (in
> addition to using it where appropriate in my algorithms.) There is also
> a generic RTree implementation that I have added to the library since
> submitting it to the vault, but that hasn't been cleared for external
> publication yet.
>
> I suggest you research R*Trees and related data structure for spatial
> query and think about how your problem resembles 1D spatial query.
>
> Thanks,
> Luke

Dear Luke,

When I returned yesterday from a very mild Berlin spring
evening with some extra grappas consumed I was so happy
to find a long answer to my first boost posting that I
joyfully conceded my interval_container classes to be
suboptimal.

Well this was pretty suboptimal with respect to my
intention to demonstrate their benefits. Next day I
realized that this was an exaggeration. So please let
me correct a little;)

Seen from the perspective of a special application
interval_containers appear to be inferior to classes
and algorithms that are tailored for that field. Yet
interval_containers are not designed to be optimized
to a special sort of application.

Interval_sets and interval_maps are intended to be much
more general in scope: As stated in my original posting
the basic idea is that an interval_set implements
a set as a set of intervals. An interval_map implements
a map as a map of intervals with associated values.

So the character of interval_sets/maps is twofold. You
can use them with the same interface as stl::sets/maps.
And you can use its interval structure by iterating
over them. Or by performing intersection operations.

The additional task in implementing interval_sets/maps,
compared to common sets/maps, is the handling of overlaps:

1. interval_set: overlapping/bordering intervals are joined.
2. split_interval_set: overlapping/bordering intervals are
   preserved.
   simple so far!
3. split_interval_map: overlapping/bordering intervals are
   also preserved. Then we must decide what to do with the
   associated values on the overlapping interval ranges.

It unfolds quite naturally that the associated values are
'added' on the overlap parts. So their type has to conform
concept Addable, more specifically I chose InplaceAddable
(+= has to exist) for efficiency reasons.

>From this design follows, that a split_interval_map is a
map that that does not overwrite values on additional
insertion but it accumulates them. All kinds of data_types
are accumulated according to their 'InplaceAddability'.

(a) So in the most simple case where data_type is int and
   we insert only value_pair([a,b),1). The resulting object
   is an overlap counter.
(b) For numeric data_types and arbitrary associated values
   you are getting a quantity counter. E. g. number of
   grappas consumed in time intervals by party guests.

In the cases (a) and (b) the problem of inefficient worst case
storage complexity is not present.

(c) Only if data_type is set<T> or some other 'InplaceAddable'
container<T> the worst case O(n^2) storage problem can arise.
In this case one may resort to a more efficient datastructure
or accept that behavior, if e.g. the overlap structure is not
excessive.

Finally: The problem of inefficient search on the associated
values: Since a split_interval_map is basically a map, it has
the downside of it's stl brother. Lookup on the data_values
is inefficient. So, in the current design efficient search
on associated values is not provided.

If this is crucial for an application a more specialized
class has to be chosen instead of split_interval_map. In
other words, like with stl::maps not every type instance
of a split_interval_map has optimal efficiency properties for
a given context of usage, but others have. The developer
has to decide then whether to use a general class or a
specialized one.

The design of my interval_containers is more oriented on
structure and orthogonality than on optimizing for specific
applications.

Nevertheless your points were very valuable and I will
consider them to improve the code.

Thank you again
Joachim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk