Boost logo

Boost :

Subject: Re: [boost] [gsoc 2013] draft proposal for chrono::date
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2013-05-04 02:56:35


Le 04/05/13 01:36, Howard Hinnant a écrit :
> On May 3, 2013, at 7:19 PM, "Vicente J. Botet Escriba" <vicente.botet_at_[hidden]> wrote:
>
>> Le 04/05/13 01:05, Vicente J. Botet Escriba a écrit :
>>> Le 03/05/13 19:53, Howard Hinnant a écrit :
>>>> On May 3, 2013, at 1:46 PM, "Vicente J. Botet Escriba" <vicente.botet_at_[hidden]> wrote:
>>>>
>>>>> Howard, I believe I start to understand why you don't want day/month/year to validate its input.
>>>>> I suspect that the rationale is quite simple, you are relaying on undefined behavior when the value of these parameters is out of range, isn't it?
>>>> class day
>>>> {
>>>> int d_;
>>>> public:
>>>> constexpr
>>>> explicit
>>>> day(int d) noexcept
>>>> : d_(d)
>>>> {}
>>>>
>>>> constexpr operator int() const noexcept {return d_;}
>>>> };
>>>>
>>>> day d(623); // no undefined behavior
>>>>
>>> The UB is when you need to build a date if the day is not on the range. E.g. the implementation could use some arrays indexed with the day and access memory that has no sense.
>>> If we are sure that a day is in the range 1..31, this kind of assumptions can be done on the implementation. The implementation could not do it if the day value has not been validated previously or behavior would be undefined.
>>>
>>>> The rationale for not checking if day is within a valid range (like I did for http://home.roadrunner.com/~hinnant/bloomington/date.html), is simply performance.
>>>>
>>>> One can not completely validate a day without the knowledge of the year and month. There are two design spaces to explore:
>>>>
>>>> 1. Validate as much as you can as early as you can. This was the approach taken by my 2011 paper and which was criticized for low performance.
>>>>
>>>> 2. Validate only once when you have the complete information.
>>>>
>>>> 1. didn't work. So now I'm proposing 2. I'm flexible. :-) There is no undefined behavior involved.
>>>>
>>>>
>>> Are you saying that you would check the day is in range 1..31 before accessing such an array when using the value to build a valid date?
>>>
>>> I'm also flexible, and this was the reason I moved to no_check. Maybe this is ugly, but covers the needs.
>>> * No validation if not desired so no loss of performances.
>>> * Validation is easier than no validation, so the user code is more robust.
>>>
>>>
>>>
>> To be concrete: In my implementation I have the following tables (some of them based on your implementation Howard)
>>
>> day::rep days_in_month_[2][13];
>> day_of_year::rep days_in_year_before_month_[2][13];
>> month::rep day_of_year_to_month_[2][366];
>> day::rep day_of_year_to_day_of_month_[2][366];
>> day_of_year::rep month_day_to_day_of_year_[2][12][31];
>>
>> Access to these tables needs that the month is in range 1..12, day(_of_month) in 1..31, day_of_year 1..366.
>
> In:
>
> Unchecked interface:
> --------------------
>
> date(year y, month m, day d);
>
> If any of the y, m or d are out of range, it would index into those tables upon field->serial conversion (if asked for) and result in undefined behavior.
>
> In:
>
> Checked interface:
> ------------------
>
> date d = year(2013) / month(5) / day(3);
> date d = year(2013) / month(5) / 3;
> date d = month(5) / day(3) / year(2013);
> date d = month(5) / day(3) / 2013;
> date d = day(3) / month(5) / year(2013);
> date d = day(3) / month(5) / 2013;
>
> There would be a single point of validation at the point where all of y, m and d are known, and there would be an error. Presumably that error would be an exception if any of the inputs are not constexpr, or a compile-time failure if all of the inputs are constexpr.
>
>
Just to be sure I'm not trying to solve an issue that doesn't needs to
be solved:
Given

   date d = day(33) / month(5) / 2013;

Note that I has a typo on the day. This is equivalent on my
implementation to

date d(year(2013), month(5), day(33));

If I want to throw a bad date exception I would need to check that the
day/month and year are in range. Next follows the code:

     days_date::days_date(year y, month m, day d)
     {
       if (set_if_valid_date(y, m, d))
         return;
       throw bad_date("day " + to_string(d) + " is out of range for "
           + to_string(y) + '-' + to_string(m));
     }

     bool days_date::set_if_valid_date(year y, month m, day d) noexcept
     {
       // [0]
       bool leap = is_leap(y.value()); // [1]
       const day_of_year::rep* year_data = days_in_year_before(leap);

       if (!(d.value() <= year_data[m.value()] -
year_data[m.value()-1])) // [2]
       {
         return false;
       }
       year::rep by = y.value() + 32799; // [3]
       x_ = days_before_year(by) + year_data[m.value()-1] + d.value();
// [4]
       return true;
     }

[1] and [3] presumes the year is in range.
[2] and [4] presumes the day and month are in range.

If some of 3 values are not in range the function set_if_valid_date
could return true or false arbitrarily.
So if the day/month and year are just names and don't ensure range I
would need to start by validating their range in [0] so that if some of
them is out of range the function return false;

This mean that the user is not able, using the type system, to help the
implementation to reduce needed check.
I have no measures of the impact of these 3 range checks (6 comparisons).
I just wanted to avoid them.

Even if we use other kind of cache, I don't see how the cached data can
be accessed without checking the range are correct.

E.g. An alternative implementation using a cache for the year_month_cache

     bool days_date::set_if_valid_date(year y, month m, day d) noexcept
     {
       // [0]
       auto c = year_month_cache(y, m); // [1]
       if (d.value() >= c.days_in_month) return false;
       x_ = c.days_before + d.value(); // [2]
       return true;
     }

This needs that y and m anre in range at [1] and d in in range at [2].

Howard, could you tell me if you find something wrong on my analysis?
I want to be too purist?

Best,
Vicente


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk