Boost logo

Boost :

From: Thorsten Ottosen (nesotto_at_[hidden])
Date: 2003-10-26 00:54:34


Here is my review of the string library.

*****summary of evaluation*******

*** What is your evaluation of the design?
In general it seems fine. The library is mostly easy and intuitive to use.

** What is your evaluation of the implementation?
The implementation looks very clean and tidy.

** What is your evaluation of the documentation?
It's reasonable. If all the discussed issues is resolved, it will be very
good.

** What is your evaluation of the potential usefulness of the library?
It's very useful.

**Did you try to use the library? With what compiler? Did you have any
problems?
I tried to use most of the layer-one functions. I could compile successfully
on these
platforms: gcc 3.3 cygwin, vc71, como4.3.0.1.

**How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
Ca. one days of work. I read the docs carefully, and tried to estimate the
usefulness of many of the functions.

**Do you think the library should be accepted as a Boost library?
Yes; the library already seem very useful. There is still room for
improvement, but many issues have been adressed during the review.

**************************
******** discussion *********

I'm now going to discuss some issues that I ran into.

WRT classification predicates, I think it has been mentioned that it would
be nice with
is_none_of() to complement is_any_of().

I get a crash when I try to call to_lower( char* ); The debugger stops
inside std::transform(). Should we not be able to do this?
Consider the use of erase_range::

erase_range( s, make_range( s.begin(), s.end() - 3 ) ) );

To have an iterator_range<> as the second argument is ok if I got one from
e.g find(), but in the above example I would have liked to provide two
iterators instead. I do think this is a minor problem and it will be solved
when the generic algorithms becomes part of the interface.

Consider algorithms with 'nth', eg.:

ierase_nth_copy( s, "s", 1 );

this does not erase the 1st copy of "s", but the second. What's the
rationale for this? I don't find it intuitive.

I seem to miss predicate taking functions for eg. erase() and replace(). I
would like to say

erase_all_if( s, is_lower() );

Or is this functionality provided by regular expressions?

Documentation: what happens when eg. find() cannot find anything? I mean
what is the condition of the retuned iterator_range? I assume it's r.empty()
== true. I think it has been proposed that the returned object should
support implicit conversion to bool to allow tests in if-statements and I
support that idea.

Consider find_head( s, 2 ); What are we finding here? AFAICT it a synonym
for make_range( s.begin(), s.begin()+ 2 ). I think, therefore, that this
algorithm is not needed.

I don't like the bool parameter on this function:

r = find_token( s, is_T(), false );

When I read the code, I simply can't figure out what it means. As it has
been suggested, I support that there is either two functions
(find_token/find_repeated_token) or an enum parameter (enum {
enable_compression | disable_compression } ). Also, what is the difference
between find_token( s, is_T() ) and find_first( s, "T" ); ? I not sure I
understand the need for this function.

The iterator_range class. Maybe it should be called sub_string to put the
focus more on string algorithms instead of generic algortihms. The border
between string and non-string algorithms seems to be fuzzy. Implicit
conversion to basic_string<T> ??

Why is the function called make_range, when the class is called
iterator_range? It seems to be normal practice to call the function
make_iterator_range() in that case.

By looking at php, it might be beneficial to provide special functions for
these jobs:

I thing that might be missing would be functions that made it possible to
escape various charactors; eg. html characters or regex strings.

Another thing might be functions for counting occurrences of substrings.

Perhaps a function that made the first letter of each word upper-case?

***** comments to documentation *****

I think the documentation is in general ok. I have the same concerns that
others have mentioned regardingexception-safety, the detailed explanation of
each function, examples, and overview table.

One thing about the synopsis that I particularly dislike is that reserved
words are in bold. I think it really disturbes reading.

More precise specification of when I can use a char* or not. Eg. maybe it
works with trim(), but not trim_copy() even though the template arguments
are called the same. It would be sufficient with some general comments.
Maybe trim_copy should return a basic_string (or a substring...see below) if
the argument is char*/[]?

In the section "First example" under the "Naming" bullet it says that the
naming follows the cnovetions used in the standard library. Novertheless,
predicate versions does not use the suffix _if. I think it would be good to
follow the STL convention and change the names accordingly.

Regarding the iterator_range class, then I think it might make the
documentation a little harder to grasp. Something as general as this could
definitely be useful on its own (although I would prefer a shorter name,
like subrange). The basic usage for iterator_range in the string library is
as a sub-string, and therefor it might be more useful with a substring class
which provided implicit conversion to basic_string. And the library could
then provide two typedefs

typedef basic_substring<char> substring; typedef basic_substring<wchar_t>
wsubstring;

And all algorithms would then specify their interface in terms of
basic_substring<Ch>. So I guess that I would like to see a substring concept
instead.

Locales. Is there any reason why a locale need to be a parameter to each
function? If one want's to use a specific locale in one function call, it
seems likely that the same locale would be used throught the entire program.
I'm just wondering if it wouldn't be sufficient with some kind of set_locale
function which then affected all the algortihms?

 best regards

Thorsten


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk