|
Boost : |
From: Jeff Garland (jeff_at_[hidden])
Date: 2005-07-23 12:50:56
On Fri, 22 Jul 2005 22:22:56 +0200, Friedrich Wilckens wrote
> I'm somewhat confused and I fear we don't quite understand each
> other. Let me try to clarify.
>
> If you interpret the periods as subsets of the integer number line Z,
> so that date_periods are sets of days, you get into troubles, for
> it is unclear what the "openness" of the intervals should mean. For
> example, [1, 2) = [1, 1] = {1}, [1, 3) = [1, 2] = {1, 2}, and so on.
Yes.
> Z with the usual distance metric carries the discrete topology, i.e.,
> *every* subset is open and closed. You would get a consistent
> system if you define the length of [a, b + 1) = [a, b] as b -a (i.e.,
> last() - begin()). Then, a single day period [a, a + 1) would
> indeed have length 0, and [1, 3) would have length 1 (not 2).
Agree. And the current implementation doesn't do this :-(
> What I had in mind is a different interpretation, and I believe it
> better captures what we have in mind when we use the periods. I consider
> periods as intervals on the real number line R. Now, a day is not a
> "point" on R, it is an interval itself. I would (as I said above) treat
> it as the right-open interval lasting from 00:00 to 23:59:59.999....
> Similar, a second like 2005-May-1 10:13:30 is a right-open interval,
>
> lasting from 10:13:30.0 until 10:13:30.9999... The points of R are time
> instances with zero duration, whereas a second has a duration of,
> well, a second.
>
> What *is* a point on R is the moment at which a day starts. We have a
> little confusion here since in notations like [2005-May-1, 2005-May-
> 2), "2005-May-1" does not refer to the whole day, but to its start,
> so it
> should be considered as a shorthand for 2005-May-1 00:00:00.0; likewise
> for 2005-May-2.
>
> In this interpretation, a date_period is is not a set of days, but a
> union of days. It lasts from the begin of its starting day to the
> end of its last day. It is a truly half-open interval, so [2005-May-
> 1, 2005-May-2) is different from the closed interval [2005-May-1,
> 2005-May-2] (though they have the same length); it is also
I'm having trouble with the idea that open and closed ranges have the same
length. It seems like that will lead to problems. More below...
> different from the closed interval [2005-May-1, 2005-May-1] which is
> a single point of length 0.
This makes sense.
> The interpretation as half-open
> intervals is built into the semantics of date_period, so closed
> intervals cannot be expressed at all. [2005-May-1, 2005-May-1) can
> be expressed and is the empty set (of length 0).
I agree with this.
> begin() returns the first day (not its starting point) contained in the
> date_period, and last() returns its last day (again, not its starting
> point).
Ah, interesting...
> The length of a period can generally be defined as end() - begin() (not
> as last() - begin() as in the Z-interpretation above). It turns out that
> [2005-May-1, 2005-May-2) just describes the whole day 2001-May-1 and
> has length 1. [2005-May-1, 2005-May-1) is the empty set and has
> length 0.
Yes.
> In this interpretation, time_periods (based on ptime) are likewise
> subsets of R. The difference is only that time_periods allow a much
> finer resolution. Every date_period could be expressed as a time_period.
> For time periods, we can ask if a certain microsecond (again, this is
> not a point, but a half-open interval of finite length) is contained
> in it; for date_periods, days are the smallest objects we consider.
Agree.
> Sorry, this email became somewhat lengthy, but I do not know how to
> express what I mean with fewer words.
No problem. This seemingly simple subject is actually quite tricky.
So after looking at this for awhile, here's what I'm thinking. Periods have 2
types of constructor. One which accepts 2 points in a 'half-open' form. One
with a point and a duration. So as I understand your proposal we would have
the following:
date d(2005,Jan,1); //0 date
date_period dp1(d, date(2004,Dec,31)); //len=-1 last=Jan1 is_null=true
date_period dp2(d, days(-1));
//dp1 == dp2 (and so on for the rest of these examples)
date_period dp3(d, d); //len=0 last=Jan1 is_null=true
date_period dp4(d, days(0));
date_period dp5(d, date(2005,Jan,2)); //len=1 last=Jan1 is_null=false
date_period dp6(d, days(1));
date_period dp7(d, date(2005,Jan,3)); //len=2 last=Jan2 is_null=false
date_period dp8(d, days(2));
As it turns out, this is pretty close to the originally intended behavior with
respect to lengths and nulls. Unfortunately, as you've seen there's some bugs
that prevent the library from behaving this way currently.
The main difference between your proposal and the current design will report
last == Dec 31 and not Jan 1 for the zero length durations. So if you print
the above periods you will get:
[2005-Jan-01/2004-Dec-30] //dp1 negative one length
[2005-Jan-01/2004-Dec-31] //dp3 zero length
[2005-Jan-01/2005-Jan-01] //dp5 one length
[2005-Jan-01/2005-Jan-02] //dp7 two length
Under your proposal we would get:
[2005-Jan-01/2005-Jan-01] //dp1 negative one length
[2005-Jan-01/2005-Jan-01] //dp3 zero length
[2005-Jan-01/2005-Jan-01] //dp5 one length
[2005-Jan-01/2005-Jan-02] //dp7 two length
I can see arguments for both, but the main thing that makes me think the
current approach is better is that it correctly distinguishes the length even
thought it is hard to see the zero and negative length cases. But with last
being the same for 3 cases there is no way to distinguish between the zero,
one and negative one length durations. So serialization of periods in this
form becomes a problem.
So I recommend that we fix the length bug and leave last alone. Make sense?
Jeff
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk