|
Boost : |
Subject: Re: [boost] [Itl] Review
From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2010-02-23 19:02:46
Hi Phil,
2010/2/23 Phil Endecott <spam_from_boost_dev_at_[hidden]>:
> My review of the proposed Interval Template Library follows. This is based
> only on studying the documentation, and previous discussion of the library
> on the Boost mailing list (e.g.
> http://lists.boost.org/Archives/boost/2009/03/149396.php ).
>
> Joachim has clearly put a lot of work into what is a substantial and
> well-documented library. Unfortunately, it seems to me that it is built on
> a weak foundation.
>
> One of the most well-known problems involving intervals is to determine
> which of a given set of intervals contains a particular point. How to store
> a set of intervals in such a way as to make this problem efficient has been
> extensively studied. If an "interval tree" (See e.g. Wikipedia:
> http://en.wikipedia.org/wiki/Interval_tree) is used, then the problem can be
> answered in logarithmic time and linear space. (More precisely: if there
> are N intervals in the container and M overlap the point of interest,
> finding them should take O(M + log N) time and the data structure should
> take O(N) space.)
>
> I was hoping to find that this library was an implementation of an interval
> tree. But it is not: it is built on top of normal standard-library
> containers. The implementation used can find the intervals that overlap a
> point in O(M + log N) time, but to do so it needs worst-case O(N^2) space
> and I think it needs O(N^2 log N) time to build that structure.
This is a very limited view on the library. I never claimed it has it's
strengths in performing stabbing queries. I has so many more aspects
and applications where it is useful and efficient.
If you insert N intervals in an interval_map the number of intervals
(nodes) in the map is limited to 2*N-1. So space is O(N). A single
addition of an interval into an interval_map has worst case time performance
of O(N). So the worst case time for the addition of N interval value pairs is
O(N^2). Which is not as bad as you write.
The most frequent usage of interval_maps are aggregations, where
you need all of the aggregated values. If, in the worst case N^2 aggregations
are involved, you have to perform them anyway. If you'd use an interval
tree to hold the data, you would not be able to store the the aggregation
results for N+k segments in your N nodes. For what interval containers
and specifically interval_maps are used most of the time, interval_trees
are not an adequate data structure as far as I can see.
Regards
Joachim
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk