Boost logo

Boost :

Subject: Re: [boost] [Itl] Review
From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2010-03-07 19:35:47


2010/3/7 Phil Endecott <spam_from_boost_dev_at_[hidden]>:
> Joachim Faulhaber wrote:
>>
>> Hi Phil,
>>
>> thank you again for your review.
>
> And thank you for finally addressing this issue.  I do not have much time to
> reply, unfortunately.
>
>> There are no data structures that are superior as such. Each has its
>> pros and its cons and the user has to take her decision of choosing
>> the appropriate one, dependent on the characteristics of her specific
>> problem domain and use case.
>
> Selection of appropriate data structures is vital.  What I believe you have
> done is to take one data structure and build an overly-generic

... too much honour for me, generic will do

> library on
> top of it.  It seems to me that it would be better to either restrict the
> scope of the library to those things that it can do efficiently,

I think that's overly-restrictive.

There are a lot of generic data structures in std and boost, that can
be used in inefficient ways and there are no specific restrictions
that guard against that.

Examples: (1) You can call insert and erase on std::vector. So you can
happily use some large vectors in applications that do a lot of
dynamic insertions and erasures on them preferably at their front
parts. Who restricts you?

(2) You can use std::map<T, SomeLargeCollection<SomeMoreStuff...> >.
Not restrictions. Sometimes people deliberately want to do this for
some quick solutions where efficiency does not matter. No
restrictions.

(3) You can use say boost::dynamic_bitset to work with sparse bitsets
that say contains bit 0 and bit 2^N-1. Works, as long your workstation
has enough memory. Restrictions?

BTW my library has more than one very space efficient solution to work
with sparse bitsets and I can not see how an interval tree
implementation can make these any better.

Moreover, restricting say split_interval_map<D, Collection<T> > from
being used, because of a potential worst case scenario, would imply to
patronize the skilled user who is well aware of the complexity problem
involved but decides, that the characteristics of his problem domain
will not be problematic (e.g. small collections and moderate
overlaps).

> or to
> extend the implementation of the library to efficiently support all of the
> advertised capabilities.

which I have already agreed upon.

>> the segmentation of the interval_map, (its
>> separation into elementary intervals that the aggregation results are
>> associated to), is only implicit in the [interval tree]. Moreover the
>> interval_tree, in contrast to the "eager implementation" does not
>> "recognize", that neighbouring intervals can be joined, because of
>> equality of the aggregated values associated to the elementary
>> segments.
>
>> Most obvious is the case of an interval_set<T>. The previously
>> discussed "worst case" (large intervals, that are all overlapping) is
>> the best case for the eager evaluating implementation here. There will
>> always be only a single interval node in the implementing std::set. So
>> space and time of all operations are constant. An interval_tree would
>> store all N intervals.
>
> I think that you could quite easily update an interval tree in such a way
> that it would store an "eager" implementation.  In the case of an
> interval_set, to insert an interval into an interval tree that is
> representing an interval_set you would
>
> 1. Erase all intervals that are contained within the new interval.
> 2. If an interval overlaps the bottom end of the new interval, erase it and
> extend the bottom end of the new interval.
> 3. If an interval overlaps the top end of the new interval, erase it and
> extend the top end of the new interval.
> 4. Insert the modified new interval.

This is pretty much what the current implementation is doing. But in
addition you have worse space performance due to the additional
information stored in each node of an interval tree: The value of the
right endpoint of the right subtree of each node.

> I guess that the other cases would be similar,

In order to join neighbouring segments of an interval_map in case of
equal aggregates, you have to either store up to 2*N-1 aggregation
results in your N-node interval tree, or you have to recompute up to
2*N-1 aggregates on every insertion ...

As I already said, there might be nifty solutions, but you would send
me back to square one, as Paul put it. And I don't think this would be
appropriate for a library that already proved useful and obviously
sufficiently efficient in pretty demanding contexts, see e.g. the
review of Eric Jonas.

> and they would have the same
> time and space complexity as your current std::set implementation.

and maybe not.

Don't get me wrong. I am not refusing to work on your proposed
optimized implementation. I fact I am pretty keen to improve my
library this way, but as for everyone else, my time and energy
resources are limited. And after putting a lot of work in a library
that seems to be pretty useful and that has found many use cases in
different fields, that comes with a formal spec and an associated tool
that allows for repeatable validation, I think it is not adequate to
devaluate the whole project on the grounds of an optimization that is
IMO important but also partial.

> Sorry for my short reply, but we are in the last few hours of the review
> period.

Thank you again for contributing on this important issue.

Regards, Joachim.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk