|
Boost : |
Subject: Re: [boost] [chrono/date] Performance goals and design summary
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2013-05-05 14:45:11
Le 05/05/13 18:39, Howard Hinnant a écrit :
> On May 5, 2013, at 4:15 AM, "Vicente J. Botet Escriba" <vicente.botet_at_[hidden]> wrote:
>
>>> I agree completely that we heed several dates and the standard (the library) must describe the performances provided by each one.
>>> <snip>
>>>
>>> For the purpose of showing the Date concept I will use a template
>>>
>>>
>>> template <typename Date1, typename Date2>
>>> days_date
>>> modifiedFollowing(const Date1& targetDate,
>>> const Date2& startDate,
>>> const int* calendar)
>>> {
>>> ymd_date ymdTargetDate(targetDate); // serial->field or nothing (so that the month() is efficient - we could store the month also).
>>> days_date date(targetDate); // field->serial or nothing, but at least there is a conversion (as we are making day arithmetic)
>>> days_date sdate = date;
>>> while (isNonBusinessDay(date, startDate, calendar))
>>> ++date;
>>> if (ymdTargetDate.month() == date.month()) // serial->field ++ No need to convert explicitly
>>> return date;
>>> date = sdate;
>>> do
>>> {
>>> --date;
>>> } while (isNonBusinessDay(date, startDate, calendar));
>>> return date;
>>> }
>>>
>>>
>>> As you can see the two dates with the same interface have not only drawbacks ;-) Providing the same interface and been convertible to each other helps the user.
>>>
>> One additional advantage of using date.month() and not using a specific conversion:
>> * class days_date could have a better algorithm that converting the days_date to ymd_date, e.g. can convert it to an ordinal_date and use a table to get the month from the is_leap() and day_of_year().
>>
>> How better than the class days_date could know how to make these operations more efficient? itself or the user?
> This concerns me. The client of days_date is going to silently walk beyond the range where days_date.month() is efficient if it is implemented with a Bloomberg-style cache. The only answer from the vendor's point of view to make this less likely to happen is to make the cache bigger. The Bloomberg cache (64Kb) is already larger than some vendors can ship (myself included). And it only spans 56 years. The cache solution is great for a special purpose implementation (financial calculations on large computers connected to permanent power sources). But is unacceptable for a history application running on a ram-constrained mobile device.
>
> In the serial->field conversion formulas I'm currently using (which are different from those shown in my 2011 paper), by the time the ordinal_date (day_of_year) and is_leap are computed, the expensive part of the serial->field conversion is done. At that point computing the month and day_of_month is small and fast table lookups (just as you describe above). The only thing that could be eliminated by days_date.month(), without a Bloomberg-sized cache, is the table lookup which converts month and is_leap to day_of_month. Said differently, days_date.month() is 98% the expense of a serial->field conversion. Not a single division operation is eliminated. days_date.day() is 100% of the cost of serial->field conversion. days_date.year() does eliminate 1 out of 6 divisions (I'm not counting divisions by powers of 2) from a full serial->field conversion, and so maybe days_date.year() is 20% cheaper than a full conversion.
>
> I note that in my 2011 paper, to_year_and_doy() contains 5 divisions (not counting division by powers of 2), but the very first of those must be a 64 bit division (This is significantly more expensive than a 32 bit division on a 32 bit machine). In that implementation (DESIGN == 2), all of day(), month() and year() are about the same cost as a full serial->field conversion. to_year_and_doy() is nearly the entire expense.
>
> Putting month() (and day() and year()) on days_date (which I myself am guilty of in my 2011 paper) is directly analogous to giving std::forward_list a size() member, or giving std::list a size() member and saying: it /should/ have constant complexity, wink, wink. It tricks people into writing inefficient code.
>
> Survey evidence of this phenomenon written concerning list::size() in 2005 (long before C++11 changed "should" to "shall"):
>
> http://home.roadrunner.com/~hinnant/On_list_size.html
>
I understand your concern and the parallelism with list::size(). From
the performance point of view the single operations provided would be
the ones that are efficient enough and maybe this is the single part
that can be standardized.
However if I take your days_date year() example the user must know that
on order to get the year it is better to go through a ordinal_date
instead of a ymd_date.
I don't know how yet, but as a end-user I would like to get this 20%
gain in a less intrusive way.
Maybe days_date shouldn't provide a year() function but can provide
another function that will not ensure the minimal performances.
year calculate_year();
I don't know what others think, but I would prefer to type
dd.calculate_year()
than
ordinal_date(dt).year()
The use of calculate_ could be a sympthom of possible deficiencies of
the algorithm.
Suggestion for a better name for the calculate_ approach are welcome.
Best,
Vicente
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk