Boost logo

Boost :

From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2008-05-16 22:16:35


2008/5/16, vicente.botet <vicente.botet_at_[hidden]>:

> Hello,
>
> I'm mainly interested in an implementation of set<T> using set<interval<T>
> and on how it comapres to the stl::set in space and time, when the set are
> dense or scattered.
>
> I have just take a look to the documentation and the implementation, and I
> have some questions?

Thank you for looking at my code. I am quite thrilled to get so many
good responses.

Let me note, before answering your questions, that I followed the
boost advice from the web page to post suggestions *before* investing
work on adjusting or producing library code to conform the boost
standards and requirements, in order to avoid unnecessary work.

I know the library might not conform some boost requirements yet, but
I am of course happy to adjust the code and design if there is
significant interest in my work.

To keep the text simple I will talk mostly of sets and interval_sets,
when I want to say things about sets/maps and
interval_sets/interval_maps because most statements apply to both.

> * Why not follow the STL set and map interfaces as far as possible?

I intended to follow the STL and there is a large set of functions and
operators common to std::set/map and itl::interval_sets/maps. But
there are differences
also.

(1) One very interesting point is that interval_sets in the design presented
do have a twofold character. An interval_set is intended to implement
set. Yet at the same time it is a set of intervals. So the question
how a function like e.g.
find should be implemented has no obvious answer.

(2) In my design I am exploiting concepts (Inplace)Addable and
(Inplace)Subtractable in many ways. That is I do rely on the existence
of a += operator for almost all and -= for many datatypes in my
library. So I have added +=, -= and also *= (intersection) to
interval_set/map. A propagation of += and -= through container
operations is a core element of my design and a base of the useful
'aggregate on overlap' (aggrovering;) mechanics of interval_maps.
(see examples party and history from http://sourceforge.net/projects/itl.)

I have defined o= operators as memberfunctions because this seemed
natural to me. But I think the whole library can be easily refactored,
making them global function templates.

> * Why do you use virtual functions? Are the classes sets, and
> interval_base_set public? I think this can and should be avoided.

The original design was classical OO-design, defining an interface
class that allows different implementations being used uniformly by a
weak polymophic interfacepointer. Furthermore there is a body of
baseclass functions that are common to interval_set and
split_interval_set. They only vary in the handling of interval
borders. This has been expressed by basecalss interval_base_set,
interitence and a virtual function 'handle_neighbours'.

All those virtual functions can be devirtualized without problems. The
interface class templates can be omitted and handle_neighbours might
be expressed via a policy template parameter.

> * Does it works with boost::interval?
>
Unfortunately not. Interval containers rely on the notion of open
interval bounds. This was necessary to express operations for
flotingpoint numbers:
interval_set<double> is;
is.insert([1,pi)); is.insert([pi,4]); // is=={[1,4]} merging INTENDED
is.insert([1,pi)); is.insert((pi,4]); // is=={[1,pi), (pi,4]} must not merge

As I've read in the archives an extension for open interval bounds
have been suggested for boost::interval in the past.

> > The benefits of interval sets and maps are
> > 1. The implementation of sets/maps via interval_set/map is very
> > space and also time efficient.
>
> Have you done some messures, in space and time? In the worst case {1, 11,
> 21...}?
>
Not yet. Space can be reckoned easily. Wost case is that every
interval inserted has exactly one value element x per interval [x,x].
In this case you have double data content and one byte extra for
borderinformation so roughly 2 times the space of stl::set.
The 'nice' case is that you can represent a set of uncountable values like
{[0.0, pi)} quite compact ;)

>
> Do you plan to boostify your code?
>
I would very much like to boostify the code. I admit thou that I am
not yet familiar with all of the boostification ;) I hope guidance
will be provided.

> I'm no sure interval_set is a model of Unique Associative Container.
>
> Given the following :
> interval_set<int> is; //not quite: interval_set<interval<int>> is;
> is.insert([1,3));
> is.insert([3,5));
>
> What is the result of
> assert(find([1,3))!= end);
> And of
> assert(find([1,5))== end);
>
I thought about find more than twice and decided not to provide it.
You can say
interval_set<int> found;
is.intersect(found, rightopen_interval([1,5]); // Intersect to find contents

As interval_set<T> implements set<T> if find is implemented it should
deliver an iterator pointing to a value, but such iterators are not
available. The closest one can provide is an iterator that points to
the interval containing the value or end(), if such interval does not
exist.

> I'm no sure interval_set is a model of Unique Associative Container.
No, I'm afraid it is not.

> Which operator would you use for set_symmetric_difference? This is a detail
> but it will be better to provided the function version of this operators as
> well.
don't know, probably function name. Are there boost suggestions for that?
>
> > split_interval_map:
> > A split_interval_map implements a map as a map of intervals
> > and associated values. For dealing with the insertion
> > of overlapping intervals the principle of "aggregation on
> > overlap" has proved to be very useful: It is shown by example:
> >
> > typedef set<string> guests;
> > split_interval_map<time, guests> party;
> > guests mary; mary.insert("Mary");
> > guests harry; harry.insert("Harry");
> > party.insert(make_pair(rightopen_interval(20:00, 22:00), mary));
> > party.insert(make_pair(rightopen_interval(21:00, 23:00), harry));
> > // party now contains
> > [20:00, 21:00)->{"Mary"}
> > [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap
> > [22:00, 23:00)->{"Harry"}
> >
> > As can be seen from the example a split_interval_map has both
> > a decompositional behavior (on the time dimension) as well as
> > a accumulative one (on the associated values).
>
> Quite interesting!
>
> > Quality, validation, portability:
> >
> > The library code of the itl has been validated with a law based
> > test automaton (LaBatea;) which is included in
> > the current release. It includes a testcase generator and law tester
> > that checks laws like e.g. commutativity<interval_set<double>, +=>:
> > the commutativity of interval_set<double> w.r.t. the operation +=.
> >
> > The code has been compiled and tested with three compilers
> > (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three
> > implementations of the stl (linux/gnu, stlport, msvc8) under
> > suse linux 10.1. and ms-windows xp.
>
> > I'd be happy if you found interval conainers as presented in this
> > post an interesting contribution to look at.
>
> I'm sure that this library will be a good candidate to boost.
>
> > If so, I am looking forward to an inspiring discussion.
>
> Best regards
>
> Vicente Botet
>
Thank you for your good questions and valuable feedback on my work.

Best regards
Joachim Faulhaber


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk