|
Boost : |
Subject: Re: [boost] [Itl] Review
From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2010-02-24 14:07:47
2010/2/24 Phil Endecott <spam_from_boost_dev_at_[hidden]>
> Joachim Faulhaber wrote:
>
>> 2010/2/23 Phil Endecott <spam_from_boost_dev_at_[hidden]>:
>>
>>> I was hoping to find that this library was an implementation of an
>>> interval
>>>
>>> tree. ?But it is not: it is built on top of normal standard-library
>>> containers. ?The implementation used can find the intervals that overlap
>>> a
>>> point in O(M + log N) time, but to do so it needs worst-case O(N^2) space
>>> and I think it needs O(N^2 log N) time to build that structure.
>>>
>>
>> This is a very limited view on the library. I never claimed it has it's
>> strengths in performing stabbing queries. I has so many more aspects
>> and applications where it is useful and efficient.
>>
>
> It might help the discussion to describe some of these applications in
> depth. I can only think of examples where the O(N^2) space happens.
> Example: consider a building access-control system where users check in and
> out with cards; the system records a set of (in_time,out_time)->name
> records. If N people work in the building, it will take O(N^2) space to
> store each day's data.
>
The party example that I chose, simply because it is so easy to understand,
seems to turn out unfortunate, because people tend to think of it as THE use
case of interval_maps. We had similar use cases with relatively small sets
and moderate overlaps where it works nicely. But for big sets and heavy
overlap this is depreciated and the worst case space is O(N^2) indeed. But
the much more important applications are those where the associated value
are numerical types or lightweight objects that can be aggregated.
If, in your access-control system, you only want to count the number of
people in the building space is only O(N), because the associated values,
the aggregation results, stay small.
Another non trivial use of interval_map, which I think is space efficient
compared to other methods is the example of large bitsets which is
implemented by means of itl::interval_maps:
http://www.herold-faulhaber.de/boost_itl/doc/libs/itl/doc/html/boost_itl/projects.html
>
> If you insert N intervals in an interval_map the number of intervals
>> (nodes) in the map is limited to 2*N-1. So space is O(N). A single
>> addition of an interval into an interval_map has worst case time
>> performance
>> of O(N). So the worst case time for the addition of N interval value pairs
>> is
>> O(N^2). Which is not as bad as you write.
>>
>
> No, I believe my assessment is correct. See Luke's explanation.
Yes, your big Os are correct. And they don't care if that's unpleasant for
me :-/
I am aware that using an itl::interval_map of sets or other collections can
be inefficient and I have noted this in the documentation. I agree that, for
interval_maps of collections a more efficient implementation is highly
desirable and that an interval_tree is a good candidate to achieve this. For
this version of the ITL, I wanted to concentrated on the interface and the
semantical properties that I have checked to get the "big picture" right.
Which is in line with the boost guidelines from
http://www.boost.org/development/requirements.html:
"Aim first for clarity and correctness; optimization should be only a
secondary concern in most Boost libraries."
Also I developed a very thorough suite of law based tests that will support
the correct development of any optimization within my library design.
A more efficient implementation of interval_maps of sets is one of the
things I wanted to do next.
The most frequent usage of interval_maps are aggregations, where
> you need all of the aggregated values. If, in the worst case N^2
> aggregations
> are involved, you have to perform them anyway. If you'd use an interval
> tree to hold the data, you would not be able to store the the aggregation
> results for N+k segments in your N nodes. For what interval containers
> and specifically interval_maps are used most of the time, interval_trees
> are not an adequate data structure as far as I can see.
>
I don't entirely understand that paragraph. In an interval tree, I don't
> store the "aggregation results"; I just store the key-intervals and data.
> Aggregation occurs when I perform a lookup; I find all of the intervals
> that overlap my point or interval of interest, and merge their data in some
> appropriate way at that time.
>
This is one of the differences between an interval tree and an
itl::interval_map. An addition of an interval value pair to an interval map
always performs an aggregation of associated values where and inserted
interval value pair overlaps. The aggregation results are always updated and
they are stored together with the segments they belong to. Using an interval
tree you need to iterate the implicit segments (I don't know if this can be
done is easily), perform the aggregations on lookups and then you need
another container to collect the results. I admit this kind of lazy
evaluation can be beneficial.
If the associated values are small e.g. numeric, the current
itl::interval_map implementation is efficient as well.
Regards,
Joachim
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk