Subject: [boost] [Review:Algorithms] Boost.Algorithms Review
From: Stewart, Robert (Robert.Stewart_at_[hidden])
Date: 2011-10-01 19:58:35
Boost.Algorithms should be accepted into Boost.
Many issues have been raised during the review and during the pre-review using Code Collaborator. I have confidence that Marshall will carefully weigh all of the ideas and concerns that have been raised and apply those that are sensible and appropriate.
I have a problem with the physical design. all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp. I realize that none_of must test all elements in a range in the case when no element matches the predicate, but that doesn't make me think of all.hpp as the place to find it.
I think "is_ordered_until" is a better name than "is_ordered" for the reasons previously elucidated. As to your dislike of "is_ordered_until", just read it as "the range is ordered until the element referenced by the return value".
As there may well be other search algorithms than those you've provided thus far, I think each should be in a separate header and documented separately. Otherwise, search.hpp may grow to be ridiculously large.
I find the naming of your formal parameters to be inconsistent: first, last, p, lo, hi, val, and range. A more consistent set of names would be first, last, predicate, low, high, value, and range. I'd even accept "pred" in that list.
Likewise, the template parameter names are inconsistent: InputIterator, Predicate, R, V, Pred, FI, patIter, and corpusIter. A more consistent set of names would be InputIterator, Predicate, Range, Value, ForwardIterator, PatternIterator, CorpusIterator. If you'd prefer to abbreviate "Iterator" with "It" or "Iter", then I'd abbreviate the categories, too: InIt, FwdIt, RandomIt, etc.
You presume that all C++ implementations with a specific value of __cplusplus provide std::all_of, std::any_of, and std::none_of. Is that portable?
Indentation is inconsistent.
The use of mpl::identity for clamp's arguments should be documented for future maintainers.
is_ordered() uses a one-line if/return and a two-line for no apparent reason. That inconsistency increases the chances of a maintainer failing to notice one or the other.
search.hpp uses BOOST_ALGO_DEBUG. "ALGO" should be spelled out. Many of the functions controlled by that manifest constant use camelcase and should be all lower case.
I was expecting more algorithm documentation in search.hpp. The code is not overly complex, but there should at least be pointers to literature on the implementation of each algorithm. However, as such literature may not always be accessible, copying pertinent information to the header would be helpful to future maintainers.
Specific notes and corrections are listed at the end of this message.
I'd like to see the complexity, behavioral, and iterator information be documented for each algorithm separately rather than for a group, such as you've done in paragraphs 3 and 4 of all.html. Note that the complexity is described below the clamp algorithms and above the all_of, any_of, and none_of algorithms. This should be consistent.
The algorithms are all function templates, so "the function <algorithm-name>" is incorrect. Thus, instead of "the function all_of", write "all_of". Look for other uses of "function" and "functions" for similar terminology issues.
While is_increasing, is_decreasing, is_strictly_increasing, and is_strictly_decreasing are implemented using is_ordered, that is by no means a requirement. Consequently, I'd prefer that each was documented separately.
Complexities written as should be written in all upper case: O(N), O(M x N), etc.
The treatment of the two internal tables used by the search algorithms is inconsistent ("first", "second", "skip", etc.). I'm not even sure they should be mentioned as you do. I realize that you want to document memory usage and customization options, but perhaps the better approach is to clearly document the traits class and its usage as well as the memory usage in terms of the pattern and corpus character sets. For example, K-M-P allocates N bytes per element in the pattern.
The Reference section for each algorithm should include "#include <boost/algorithm/foo.hpp>", to permit copying and pasting, plus complexity and template parameter requirements.
Template parameters should be documented just as function parameters. Thus, for example, in the reference page for knuth_morris_pratt_search(), there should be a section that describes the two iterator types including their iterator category as well as their relationship (which elsewhere you mention requiring that their value_types be identical).
The boyer_moore_search() and boyer_moore_horspool_search() algorithms do not have reference pages.
There are no examples in the text. There should be an example for each algorithm.
The search algorithms' traits classes are not documented.
This library, in its current state has certain utility. As a place to incorporate future algorithms that don't deserve a library of their own, it has enormous utility.
I did not try to use the library due to time constraints.
I spent many hours studying and commenting on the code during the pre-review and now, reading the documentation, etc.
I have no experience designing string search algorithms. I have used many algorithms. I have designed a good deal of generic code. Thus, I have broad domain knowledge, but nothing extraordinary.
Description and Rationale
2nd para, 3rd sent: missing "and" from "tested, reviewed, documented"
Written with this review in mind, this section is fine, but for an accepted library, it isn't right. I suggest something like this:
"This library will be extended by algorithms submitted or suggested by Boost developers and users. The Adobe Source Library (http://stlab.adobe.com/), for example, contains many useful algorithms with documentation and test cases that might be culled for inclusion in this library. Knuth's _The Art of Computer Programming_ is full of algorithm descriptions, too. Various algorithms have been created and documented in magazines, online forums, etc., or even just for personal use. All of these sources, and more, can provide fodder for algorithms to add to this library.
"Adding new algorithms will require some form of review, but will not be subject to the usual Boost review process. The Boost.Algorithm maintainer would have the ability to add algorithms without review, but this will not be the case for non-trivial algorithms. Non-trivial algorithms, and possibly even trivial algorithms, will be subjected to a mini-review. This will be a call for Boost developers to study the proposed algorithms with the goal to get more minds to consider their design and utility to ensure their broad utility."
The heading can just be "all.hpp".
1st para, 1st sent: Strike "The header file".
1st para, 2nd sent: any_of does not verify that every element has a particular property.
2nd para: All three of these algorithms is in C++11. Try this wording: "The iterator-based versions are part of the standard library in C++11 (§25.2 Non-modifying sequence operations). Boost.Algorithms will use the standard library versions of these algorithms when available."
4th para: The algorithms will not work on output iterators. Therefore, you should indicate that they require input iterators.
The heading can just be "clamp.hpp".
1st para, 1st sent: Strike "The header file". The description is insufficient, which you seem to understand since you quoted "clamp". I suggest this instead: "clamp.hpp contains function templates to constrain a value to be within the inclusive range formed from supplied boundary values."
2nd para, bullets: s/iff/if. "iff" is overly proscriptive on the implementation. That is, lo could be returned if v == lo without affecting the result.
3rd para: "also" this sentence refers the previous paragraph, which uses < in the bullets, but this is a bit awkard. It would be better to describe each version of clamp separately. The second would have the same bulleted list, but would use p(v, lo) and p(hi, v).
The heading can just be "ordered.hpp".
1st para, 1st sent: Strike "The header file". Strike the parentheses.
2nd para, 4th sent: Surely is_ordered returns the end iterator for sequences of less than two elements.
3rd para: s/are will not/will not/
5th para, 1st sent: I'd prefer, "There are also a number of function templates that use is_ordered to determine whether an entire sequence satisfies a particular ordering."
The heading can just be "search.hpp".
1st para, 1st sent: Strike "The header file".
3rd para: merge with the 2nd para
4th para: Change from first person and quoting to "For each of these search algorithms, the sequence being searched is referred to as the <i>pattern,</i> and the sequence being searched as the <i>corpus.</i>"
5th para, 1st sent: Change to "These search algorithms generate tables to make searches faster."
5th para, 3rd sent: s/To help with this/To provide control over the overhead/
5th para, 3rd sent: s/;/:/
5th para, 4th sent: s/the search/the search, thereby using the same generated table(s) for multiple searches./
7th para: s/and here are/and here is/
8th para, last sent: Must I1::value_type be the same as I2::value_type or must they merely be comparable?
1st para: The wording at the end is confusing. I think you mean, "benchmark used in the literature on practical string searching."
merge with the 1st para
s/target string (pattern) that is being search for/pattern/
s/algorithm, while still...can have/algorithm has O(N) complexity, where N is the corpus length, but can have/
s/algorithms; it doesn't/algorithms because it doesn't/
s/gets faster as the key being search for becomes longer/is faster for longer patterns/
s/skip table (the second one) as stored as an/skip table (the second one); it is stored as a/
s/are (say) UTF-16 text; in that case, the/are UTF-16 text, for example, since the/
s/The B-M and B-M-H objects take a 'traits' template parameter which lets/The traits template parameter of the B-M and B-M-H class templates allow/
1st para: s/Boyer-Moore, and has/Boyer-Moore, but has/
s/algorithm an internal/algorithm uses an internal/
You should mention that the traits template parameter allows using a different container to store that data so the memory usage can change. Unless this reference is to the "first" table.
2nd para: s/looking for the match - instead of just at the next entry/looking next for a match rather than just advancing to the next element/
I don't think the all.hpp content should be in reference.html. It should be found by a link like the other three headers.
The declarations for all.hpp are compressed too much. Some vertical whitespace would be helpful.
Why is "Marshall Clow" in the "Header <boost/algorithm/all.hpp>" section? It also appears in the clamp.hpp reference page.
Function template clamp (both)
Returns: Could be improved: "lo if p(val, lo) returns true, hi if p(hi, val) returns true, and val otherwise."
The declarations are compressed too much. Some vertical whitespace would be helpful.
Function template is_ordered (iterators)
Returns: Could be improved: "the iterator it, in the range [first, last), for which p(*it) returns false, or last if there is no such iterator."
Function template is_ordered (range)
Returns: Could be improved: "the iterator it, in range, for which p(*it) returns false, or boost::end(range) if there is no such iterator."
Function template is_increasing (iterators)
Returns: "true if each element in [first, last) is greater than or equal to the previous element or [first, last) has fewer than two elements."
Function template is_increasing (range)
Returns: "true if each element in range is greater than or equal to the previous element or range has fewer than two elements."
Function template is_decreasing (iterators)
Returns: "true if each element in [first, last) is less than or equal to the previous element or [first, last) has fewer than two elements."
Function template is_decreasing (range)
Returns: "true if each element in range is less than or equal to the previous element or range has fewer than two elements."
Function template is_strictly_increasing (iterators)
Returns: "true if each element in [first, last) is greater than the previous element or [first, last) has fewer than two elements."
Function template is_strictly_increasing (range)
Returns: "true if each element in range is greater than the previous element or range has fewer than two elements."
Function template is_strictly_decreasing (iterators)
Returns: "true if each element in [first, last) is less than the previous element or [first, last) has fewer than two elements."
Function template is_strictly_decreasing (range)
Returns: "true if each element in range is less than the previous element or range has fewer than two elements."
The declarations are compressed too much, though the horizontal whitespace does help in the latter half. Some vertical whitespace would be helpful.
Class template boyer_moore
The constructor deserves information on memory allocated, what exceptions might be emitted, etc.
The function call operator should include information on its behavior, complexity, etc.
The private member functions should not be listed.
Class template boyer_moore_horspool
Class template knuth_morris_pratt
Rob Stewart robert.stewart_at_[hidden]
Software Engineer using std::disclaimer;
Dev Tools & Components
Susquehanna International Group, LLP http://www.sig.com
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk