|
Boost : |
Subject: Re: [boost] [Review:Algorithms] Boost.Algorithms Review
From: Marshall Clow (mclow.lists_at_[hidden])
Date: 2011-10-01 20:31:05
On Oct 1, 2011, at 4:58 PM, Stewart, Robert wrote:
> Boost.Algorithms should be accepted into Boost.
Thanks!
> Many issues have been raised during the review and during the pre-review using Code Collaborator. I have confidence that Marshall will carefully weigh all of the ideas and concerns that have been raised and apply those that are sensible and appropriate.
>
> __________
> Design
>
> I have a problem with the physical design. all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp. I realize that none_of must test all elements in a range in the case when no element matches the predicate, but that doesn't make me think of all.hpp as the place to find it.
>
> I think "is_ordered_until" is a better name than "is_ordered" for the reasons previously elucidated. As to your dislike of "is_ordered_until", just read it as "the range is ordered until the element referenced by the return value".
>
> As there may well be other search algorithms than those you've provided thus far, I think each should be in a separate header and documented separately. Otherwise, search.hpp may grow to be ridiculously large.
I've gone back and forth with this idea.
I think that's a good idea for the searching routines, but not for any/all/none - but it's a matter of taste; because the any/of/all routines are so small.
I'd like to avoid the general case where each algorithm has it's own header file, because that adds to programmer annoyance (and compile time).
That being said, I don't think that all the algorithms in the library should go into the same header file, either.
> Implementation
>
> I find the naming of your formal parameters to be inconsistent: first, last, p, lo, hi, val, and range. A more consistent set of names would be first, last, predicate, low, high, value, and range. I'd even accept "pred" in that list.
>
> Likewise, the template parameter names are inconsistent: InputIterator, Predicate, R, V, Pred, FI, patIter, and corpusIter. A more consistent set of names would be InputIterator, Predicate, Range, Value, ForwardIterator, PatternIterator, CorpusIterator. If you'd prefer to abbreviate "Iterator" with "It" or "Iter", then I'd abbreviate the categories, too: InIt, FwdIt, RandomIt, etc.
>
> You presume that all C++ implementations with a specific value of __cplusplus provide std::all_of, std::any_of, and std::none_of. Is that portable?
I got that from n3290, which says in section 16.8 (predefined macro names):
_ _ cplusplus
The name _ _ cplusplus is defined to the value 201103L when compiling a C++ translation unit. [157]
[157] It is intended that future versions of this standard will replace the value of this macro with a greater value. Non-conforming compilers should use a value with at most five decimal digits.
On the other hand, if Boost.Config has a specific "compiling with C++11" flag, using that would be better. I couldn't find one, though.
The flag "BOOST_PP_VARIADICS" triggers on that, and also BOOST_WAVE_SUPPORT_CPP0X depends on it.
>
> Indentation is inconsistent.
OK.
>
> The use of mpl::identity for clamp's arguments should be documented for future maintainers.
OK.
> is_ordered() uses a one-line if/return and a two-line for no apparent reason. That inconsistency increases the chances of a maintainer failing to notice one or the other.
Ok.
>
> search.hpp uses BOOST_ALGO_DEBUG. "ALGO" should be spelled out. Many of the functions controlled by that manifest constant use camelcase and should be all lower case.
Ok.
> I was expecting more algorithm documentation in search.hpp. The code is not overly complex, but there should at least be pointers to literature on the implementation of each algorithm. However, as such literature may not always be accessible, copying pertinent information to the header would be helpful to future maintainers.
There are some links there.
I have no problems with adding more - until it starts obscuring the code ;-)
For example in search.hpp, line 147:
/*
A templated version of the boyer-moore searching algorithm.
References:
http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/
http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf
Explanations: boostinspect:noascii (test tool complains)
http://en.wikipedia.org/wiki/Boyer%96Moore_string_search_algorithm
http://www.movsd.com/bm.htm
http://www.cs.ucdavis.edu/~gusfield/cs224f09/bnotes.pdf
and so on
> Documentation
>
> Specific notes and corrections are listed at the end of this message.
>
> I'd like to see the complexity, behavioral, and iterator information be documented for each algorithm separately rather than for a group, such as you've done in paragraphs 3 and 4 of all.html. Note that the complexity is described below the clamp algorithms and above the all_of, any_of, and none_of algorithms. This should be consistent.
>
> The algorithms are all function templates, so "the function <algorithm-name>" is incorrect. Thus, instead of "the function all_of", write "all_of". Look for other uses of "function" and "functions" for similar terminology issues.
>
> While is_increasing, is_decreasing, is_strictly_increasing, and is_strictly_decreasing are implemented using is_ordered, that is by no means a requirement. Consequently, I'd prefer that each was documented separately.
>
> Complexities written as should be written in all upper case: O(N), O(M x N), etc.
>
> The treatment of the two internal tables used by the search algorithms is inconsistent ("first", "second", "skip", etc.). I'm not even sure they should be mentioned as you do. I realize that you want to document memory usage and customization options, but perhaps the better approach is to clearly document the traits class and its usage as well as the memory usage in terms of the pattern and corpus character sets. For example, K-M-P allocates N bytes per element in the pattern.
>
> The Reference section for each algorithm should include "#include <boost/algorithm/foo.hpp>", to permit copying and pasting, plus complexity and template parameter requirements.
>
> Template parameters should be documented just as function parameters. Thus, for example, in the reference page for knuth_morris_pratt_search(), there should be a section that describes the two iterator types including their iterator category as well as their relationship (which elsewhere you mention requiring that their value_types be identical).
>
> The boyer_moore_search() and boyer_moore_horspool_search() algorithms do not have reference pages.
>
> There are no examples in the text. There should be an example for each algorithm.
>
> The search algorithms' traits classes are not documented.
>
> __________
> Usefulness
>
> This library, in its current state has certain utility. As a place to incorporate future algorithms that don't deserve a library of their own, it has enormous utility.
>
> __________
> Usage
>
> I did not try to use the library due to time constraints.
>
> __________
> Effort
>
> I spent many hours studying and commenting on the code during the pre-review and now, reading the documentation, etc.
Yes, and I appreciate it.
> Documentation Issues
[ A long list snipped ]
Thank you for your attention to detail, Robert!
-- Marshall
Marshall Clow Idio Software <mailto:mclow.lists_at_[hidden]>
A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
-- Yu Suzuki
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk