|
Boost : |
Subject: Re: [boost] [Review] ITL review starts today, February 18th
From: Barend Gehrels (barend_at_[hidden])
Date: 2010-02-28 11:48:23
Hi List, Hartmut, Joachim,
This is my review of the Interval Template Library (ITL). YES, I vote
+1, this library should be accepted into Boost. It is very very useful.
During my review my enthousiasm was only growing. Even if there was a
thing confusing on the operators (for me). Yes, I can use this library
in real life and it will solve problems for me.
1) WHAT I DID
For this review I've done three projects of which two are related to my
day-to-day or hobby programming efforts.
a) The first project is related to genealogy. I took up the idea of Phil
and generated an interval map of livetimes of all persons in my
genealogical database. This database contains ~3000 persons.
Many of the persons are still alive. ITL contains a right open interval
but that denotes the right-hand-side excluding, while I need an interval
that is unlimited at one side. Of course I could workaround using 9999
as year of death. An additional problem is that for many people death
date is unknown. I don't expect ITL to handle fuzzy intervals so I
ignore this.
Everything went smoothly. There was not any delay noticable (see below).
I started with a provided example but changed some things, such as using
a const_iterator and const references, not yet using the documentation,
and everything is standard and works as expected. The function I drafted
is below.
b) The second project is related to planning and "leave requests". I've
implemented this some years ago and remember that it got quite complex,
with weekends, national holiday days and people working sometimes 40
hours a week, but sometimes not on Friday afternoon etc. So I tried if
Boost.Interval would have simplified my live there. And yes, it would
have done for sure!
Starting with the manpower example I made adaptions, actually only
simplifying it, plus subtracting days-of-per-week, until I had what I
wanted. The whole routine is in one screen now and definitely logic and
simple. The drafted functions are below.
c) I did some small other studies as well, especially on the map
operators, and to draft an alternative "intro for dummies"... Of this,
pieces are here in this review.
Let me repeat that I found this library VERY USEFUL and that I'm
surprised that there are not many reviews yet.
2) DOCUMENTATION
The documentation is extensive and good. I've some small remarks about
the introduction.
When I started I was confused by the definitions of interval_set and
interval_map, and when to use them. It might be explained a little
better for people not having worked with intervals before, probably with
some sample code and images. The documentation contains, at the start:
"It means that we can use an interval_set or interval_map like a set or
map of elements. It exposes the same functions."
and then gives the example " mySet.contains(42)". But contains is NOT an
method of std::set, but YES from interval_set. This was confusing.
I then expected a sample for the map below but not there... Furthermore,
that threelined set-sample means that the *interval* 42-42 is inserted,
which was also not clear to me at first sight... I would change that
section. I would suggest something like this:
void intro_sets_for_dummies()
{
namespace itl = boost::itl;
// STANDARD std::set works with values
std::set<int> std_set;
std_set.insert(3);
std_set.insert(7);
BOOST_FOREACH(int const& value, std_set)
{
std::cout << " " << value;
}
//std::cout << " " << (std_set.contains(3) ? "YES" : "NO"); //not
implemented in std::set
std::cout << std::endl;
// BOOST.ITL itl::interval_set works with intervals instead of values
itl::interval_set<int> itl_set; // Interval's of integer, e.g.
seconds or years
itl_set.insert(itl::interval<int>(3,5));
itl_set.insert(itl::interval<int>(4,8)); // Overlap with previous is
detected automatically
itl_set.insert(1); // Interval of (1,1)
BOOST_FOREACH(itl::interval<int> const& interval, itl_set)
{
std::cout << " " << interval.first() << ".." << interval.last();
}
std::cout << " " << (itl_set.contains(3) ? "YES" : "NO") << std::endl;
// 1..1 3..8 YES
}
The same for map. The itl::interval_map is REALLY very useful. My
suggestion:
void intro_maps_for_dummies()
{
namespace itl = boost::itl;
// STANDARD std::map
std::map<int, int> std_map;
std_map[3] = 4;
std_map[7] = 8;
for(std::map<int,int>::const_iterator it = std_map.begin(); it !=
std_map.end(); ++it)
{
std::cout << " " << it->first << ": " << it->second;
}
std::cout << std::endl;
// BOOST.ITL itl::interval_map
itl::interval_map<int, int> itl_map;
itl_map += std::make_pair(itl::interval<int>(3,5), 4); // add 4 to [3,5]
itl_map += std::make_pair(itl::interval<int>(4,8), 8); // add 8 to
[4,8] -> will generate 3 intervals
itl_map -= std::make_pair(itl::interval<int>(5,5), 2); // subtract 2
-> will split another interval
for(itl::interval_map<int, int>::const_iterator it =
itl_map.begin(); it != itl_map.end(); ++it)
{
itl::interval<int> const& interval = it->first;
std::cout << " " << interval.first() << ".." << interval.last()
<< ": " << it->second;
}
std::cout << std::endl;
// OUTPUT: 3..3: 4 4..4: 12 5..5: 10 6..8: 8
}
This would give me a clearer picture how it works. At the same time I
now realize that during our discussion on +=, -=, |= , I didn't realize
enough that += really does mathematics behind the screens. So it adds
values where to the map on the specified interval, it subtracts values
from the map on the specified interval. This goes beyond "set theoretic
operations", "set union" and especially "set difference". See below for
more on this.
I realize that a similar sample is there in the documentation, and that
this principle, "aggregate on overlap" is well explained there. However,
a sample with the numeric values is (for me) somewhat more clear than
the given sample with names which are added to a set; the sample of
interval_map with itl::set probably does too much at a time...
But let me repeat that the overall documentation is impressive and complete!
3) PERFORMANCE
Phil claimed that:
> "consider trying to store the lifespans of people in a genealogy
> program: recording a few thousand people living over a few hundred
> years would approach the worse-case space complexity and need
> megabytes of storage"
This was part of my motivation to do project 1 above, because these few
thousand people over a few hundred years is exactly what I have.
Boost.Itl calculates and reports the livespans in 0.031 seconds. So,
measured in time, no problem at all. I compared it with another
implementation which run in 0.032 seconds, so the same. I will not say
that it cannot be enhanced somehow or whatever, I didn't study it in
depth, I would only say: the library is well performing, also if the
input set grows larger. It should not be rejected for the assumption
that it could have been built otherwise.
Besides that, Boost.Itl does more than normal intervals: the
interval_map associates values to intervals.
4) SOME NOTES
- the samples do not use it, but sets and maps are working with Boost.Range
- sets are also working with BOOST_FOREACH
- in the code samples, "using" is used in many places (or everywhere).
While many Boost libraries do this, I would say that this is unclear for
users visiting for the first time. What is that "date"? Is it a
gregorian date, or an ITL date? So I would say something like
namespace gr = boost::gregorian;
namespace itl = boost::itl;
and use this like "gr::date" and "itl::interval<...>" everywhere, such
that the origin of all those terms is immediately clear. As said, this
is not criticism on Boost.ITL, it is a hint for enhancing Boost docs for
first time users. I'm often struggling with the origin of terms
especially if two libraries are used together. I realize we've done this
ourselves in Geometry as well and I think changing it would make it more
clear.
5) OPERATORS and FUNCTIONS
During the review period I had a discussion on the list with Joachim
about the operators and the naming of functions. However, I thought that
the += and |= were both doing a pure union, but it appears that they are
doing a union PLUS a (sometimes mathematic) addition. So my current
opinion is:
For sets (o.a. interval_set), the following operators are appropriate,
as also used in Boost.dynamic_bitset:
&= for intersection
|= for union
^= for symmetric difference
-= for difference
So this is how it is implemented but I would remove += as a synonym for |=
For interval_map, it is currently much less clear to me now. The
operators do two things at the same time, which is confusing me.
Let's see the following code pieces:
1| itl::interval_map<int, int> map1, map2;
2| map1 += std::make_pair(itl::interval<int>(3,5), 4);
3| map2 += std::make_pair(itl::interval<int>(4,6), 5);
4| map1 |= map2;
This gives: 3..3: 4 4..5: 9 6..6: 5. So it is a set-theoretic
union, PLUS mathematical addition. Aggregation on overlap. It IS very
useful, but it does two things at the same time, and should be
implemented and described carefully.
Changing the last line to intersection:
4| map1 &= map2
gives 4..5: 9, so it is a set-theoretic intersection on intervals, PLUS
mathematical addition. I still agree.
But now:
4| map1 -= map2
gives 3..3: 4 4..5: -1 This is actually unexpected. It is not
set-difference, because that would leave only 3..3 in the set and remove
4..5 completely. So what it actually does is subtracting 5 from all
there was before in map1, ONLY mathematical operation (minus), no
set-theory here...
The documentation describes it like this: "Subtraction on Maps
implements a map difference function similar to set difference", and it
forwards to a link with more info about this ("While addition and
subtraction on Sets are implemented as set union and set difference,"),
I still don't agree, it is not set difference here.
To be complete:
4| map1 ^= map2
gives 3..3: 4 6..6: 5. So yes, this IS the set-symmetric
difference and (by nature of that) does or does not mathematical
addtion, that is not important here.
Therefore I find the interval_map operators still confusing, even more
confusing than before. Are they doing mathematics or set-theory? So I
would suggest something like this:
4| map1 |= something(map2, std::plus);
to make it more explicit. This is the set-theoretic union where on all
overlaps the two sets are added, and
4| map1 |= something(map2, std::minus); where the two sets are
unioned, and the values of map2 are subtracted from map1. This is
slightly different from the current -= implementation.
4| map1 |= something(map2, std::multiplies); item, but values are
multiplied
And then the same for &= (intersection), -= (set theoretic difference).
Then ^= probably need no policy because it always handled the intervals
which are not common, an operator is not applicable here.
There are combine functors described (I saw later), but they are used
different (I think) than this suggestion.
This would make it less confusing AND probably would give it more
functionality, because the *= is currently not implemented (and might be
reserved for scaling).
This was map + map. The same applies for the operator-= on map,
interval, so:
1| itl::interval_map<int, int> map;
2| map += std::make_pair(itl::interval<int>(3,5), 4);
3| map -= std::make_pair(itl::interval<int>(4,6), 3);
result in 3..3: 4 4..5: 1
This could actually be stated clearer by using the std::map approach, on
intervals, I tried it intuitively but it does not compile:
map[itl::interval<int>(3,5)]+= 4;
map[itl::interval<int>(4,6)]-= 3;
I would not have any problems with this, and I would expect still doing
aggregate on overlap behind the scenes, and not set-theory, resulting in:
3..3: 4 4..5: 1 6..6: -3
6) STANDARD QUESTIONS
> - What is your evaluation of the design?
>
The library is well designed. The comments above were only on one detail.
> - What is your evaluation of the implementation?
>
I had only a glance to the implementation. It is clear, nice short
functions everywhere, and well documented. It is build on top of the
std::set and std::map, which is reported to be problematic, but I don't
think it is. My test runs very fast.
> - What is your evaluation of the documentation?
>
Impressive and quite complete. The introduction might add a section or
tutorial for real-starters
> - What is your evaluation of the potential usefulness of the library?
>
Very useful. I mentioned two projects where the library would have
helped me if it had been into Boost earlier, and I'm considering
rewriting them such that the problems make use of the library, and
simplify them in that way.
> - Did you try to use the library? With what compiler? Did you have any
> problems?
>
I used it using Visual C++ 2008 Express Edition and I didn't have any
problems.
> - How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?
>
About 10 hours spread over three days.
> - Are you knowledgeable about the problem domain?
>
Finally, thanks to Joachim for developing and submitting this excellent
library, and to Hartmut for managing this review.
Regards,
Barend Gehrels,
Amsterdam
Ad a:
void CreateTimeline(std::ofstream& stream)
{
namespace itl = boost::itl;
typedef itl::interval_map<int, int> map;
map livetimes;
for (TGenealogyVertexIterator vi = vertices(m_graph).first;
vi != vertices(m_graph).second;
++vi)
{
CGenealogyVertexProperty p = GetProperty(*vi);
if (p.indi.type == TYPE_PERSON && p.indi.birth.date.hasdate)
{
int death_date = p.indi.death.date.hasdate ?
p.indi.death.date.year : 9999;
livetimes +=
std::make_pair(itl::interval<int>(p.indi.birth.date.year, death_date), 1);
}
}
stream << "LIVETIMES: " << std::endl;
for (map::const_iterator it = livetimes.begin();
it != livetimes.end(); ++it)
{
boost::itl::interval<int> const& when = it->first;
stream << when.first() << " - " << when.upper() << " : " <<
it->second << std::endl;
}
}
Ad b:
void calculate_leave_request()
{
namespace gr = boost::gregorian;
namespace itl = boost::itl;
itl::interval<gr::date> scope
(
gr::from_simple_string("2010-04-01"),
gr::from_simple_string("2010-05-31")
);
itl::interval_set<gr::date> weekends_plus_festival_days(scope);
weekends_plus_festival_days -= weekends(scope);
weekends_plus_festival_days -= gr::from_simple_string("2010-04-05");
// Eastern
weekends_plus_festival_days -= gr::from_simple_string("2010-04-30");
// Queensday
weekends_plus_festival_days -= gr::from_simple_string("2010-05-13");
// Ascension day
weekends_plus_festival_days -= gr::from_simple_string("2010-05-24");
// Whitsun
itl::interval_map<gr::date, double> request_in_hours;
// Add 8 hour per day
request_in_hours += std::make_pair(scope, 8.0);
request_in_hours &= weekends_plus_festival_days;
// Subtract 4 hour for all Fridays
{
gr::date date = gr::next_weekday(scope.first(),
gr::greg_weekday(gr::Friday));
gr::week_iterator it(date);
while(it <= scope.last())
{
itl::interval<gr::date> scope(*it, *it);
request_in_hours -= std::make_pair(scope, 4.0);
++it;
}
}
report("Request", scope, request_in_hours);
}
// where report is:
template <typename Scope, typename Map>
double report(std::string const& caption, Scope const& scope, Map const&
map)
{
namespace itl = boost::itl;
namespace gr = boost::gregorian;
std::cout << caption << ": " << scope.first() << " - " <<
scope.last() << std::endl;
double hours = 0;
for(typename boost::range_iterator<Map const>::type it =
boost::begin(map);
it != boost::end(map); ++it)
{
itl::interval<gr::date> const& interval = it->first;
// Add 1, because Boost.Date counts [2010-05-01 ... 2010-05-02
as one day]
int days = 1 + gr::date_period(interval.first(),
interval.last()).length().days();
std::cout << interval.first() << " - " << interval.last()
<< " -> " << days << " * " << it->second << std::endl;
hours += days * it->second;
}
std::cout << "Total hours: " << hours << std::endl;
return hours;
}
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk