Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2003-10-26 04:58:28


On Sun, Oct 26, 2003 at 04:54:34PM +1100, Thorsten Ottosen wrote:
> Here is my review of the string library.

[snip]
>
> WRT classification predicates, I think it has been mentioned that it would
> be nice with
> is_none_of() to complement is_any_of().

It has been already identified, that clasification predicated need some rework.
I have already added logical combinators( like and/or/not) and enabled
easy Boost.Lambda integration.
Next step would be to eliminate the need for explicit char-type sepcification.

With this new features, you will not need is_none_of anymore. You simply
write

        class_not( is_any_of("abc") );

(Note, class_not is not the final name, I have to find a better one)

or alternatively you can use Boost.Lambda

   trim( str1, !bind( is_any_of("abc"), _1 ));

However, I will add is_from_range predicate, which is obviosly missing.

 
> I get a crash when I try to call to_lower( char* ); The debugger stops
> inside std::transform(). Should we not be able to do this?

This should work, could you please send me an example? I might be possible,
that you have used a pointer fro write protected segment. But this is just my guess.

> Consider the use of erase_range::
>
> erase_range( s, make_range( s.begin(), s.end() - 3 ) ) );
>
> To have an iterator_range<> as the second argument is ok if I got one from
> e.g find(), but in the above example I would have liked to provide two
> iterators instead. I do think this is a minor problem and it will be solved
> when the generic algorithms becomes part of the interface.

Rationale for not providing the 2 iterators alternative is simple. To reduce number
of overloads. But it is not a problem if the practical experience shows a need for it.

> Consider algorithms with 'nth', eg.:
>
> ierase_nth_copy( s, "s", 1 );
>
> this does not erase the 1st copy of "s", but the second. What's the
> rationale for this? I don't find it intuitive.

The index is starts from zero. It is explicitly specified in the docs.
Rationale for it is taht all c-array start with 0 and for cycles it is
usual practice to go from 0.

> I seem to miss predicate taking functions for eg. erase() and replace(). I
> would like to say
>
> erase_all_if( s, is_lower() );

It is possible to add this to interface. Curently it is possible to do it like this

namespace sa=boost::string_algo;
sa::replace_all( s, sa::token_finder(is_lower()), sa::empty_formater(s) );

It is not a problem the add this to 1-layer interface. Question would be, what would
you await from replace_all_if?

> Or is this functionality provided by regular expressions?
>
> Documentation: what happens when eg. find() cannot find anything? I mean
> what is the condition of the retuned iterator_range? I assume it's r.empty()
> == true. I think it has been proposed that the returned object should
> support implicit conversion to bool to allow tests in if-statements and I
> support that idea.

You are right, if find does not find anything, returned range should be empty.
This is missing in the docs, and I will add it there.

About the conversion to safebool, in my personal opinion, it is not explicit
enough. If th result is an empty range, it is obvious, that operation did not
found anything.
However, because iterator_range is not used only by find algorithms,
cast to bool can be misleading. After all, it is a kind of container.
 
> Consider find_head( s, 2 ); What are we finding here? AFAICT it a synonym
> for make_range( s.begin(), s.begin()+ 2 ). I think, therefore, that this
> algorithm is not needed.

Problem is when your string is not long enough. So would need to check
all boundary conditions. _head does it for you. Because this operation
is quite common, it is included int he library.

> I don't like the bool parameter on this function:
>
> r = find_token( s, is_T(), false );
>
> When I read the code, I simply can't figure out what it means. As it has
> been suggested, I support that there is either two functions
> (find_token/find_repeated_token) or an enum parameter (enum {
> enable_compression | disable_compression } ). Also, what is the difference
> between find_token( s, is_T() ) and find_first( s, "T" ); ? I not sure I
> understand the need for this function.

I will add the enum, as already suggested. find_token is primary designed to be
used by split operations. And obiosly you cannot do find_token( s, is_any_of("abc") )
with just find_first.

> The iterator_range class. Maybe it should be called sub_string to put the
> focus more on string algorithms instead of generic algortihms. The border
> between string and non-string algorithms seems to be fuzzy. Implicit
> conversion to basic_string<T> ??

Whole library is designed in the way that it is not restricted to std::basic_string.
Therefor such a convention would break the rule.

If you need to convert an iterato_range ot a particular container, you can use
functinon copy_range ( f.e. s = copy_range<string>( range ); ).

> Why is the function called make_range, when the class is called
> iterator_range? It seems to be normal practice to call the function
> make_iterator_range() in that case.

I was battling two opposide arguments when naming this function. I din't
want to make it logn to write (because it is heavily used) and I din't
want to name iterator_range, just a "range" because it is too general name.

I understand, that currently, this naming is a little bit inconsistent.
I'd like other to form their opinion about this, how it should be changed.

> By looking at php, it might be beneficial to provide special functions for
> these jobs:

I wanted ot provide a set of reneric algorithms. Special functions like
escaping the string, should go IMHO in some special purpose HTML processing
libs. However, such a things could be made quite easily using string_algo,
so I can provide example how to do it.
 
> I thing that might be missing would be functions that made it possible to
> escape various charactors; eg. html characters or regex strings.
>
> Another thing might be functions for counting occurrences of substrings.

This seems as an useful utility, I will add it to my todo list.

> Perhaps a function that made the first letter of each word upper-case?
>
> ***** comments to documentation *****
>
> I think the documentation is in general ok. I have the same concerns that
> others have mentioned regardingexception-safety, the detailed explanation of
> each function, examples, and overview table.
>
> One thing about the synopsis that I particularly dislike is that reserved
> words are in bold. I think it really disturbes reading.

This issue being a highly personal matter, I would suggest you to start a new thread about it.
 From the implementation POV, it is the matter of boost-book formating.

In my opinion, highlighting make the code more readable.

> More precise specification of when I can use a char* or not. Eg. maybe it
> works with trim(), but not trim_copy() even though the template arguments
> are called the same. It would be sufficient with some general comments.
> Maybe trim_copy should return a basic_string (or a substring...see below) if
> the argument is char*/[]?

This issue will be solved by proper concept specifications. As already requested.

> In the section "First example" under the "Naming" bullet it says that the
> naming follows the cnovetions used in the standard library. Novertheless,
> predicate versions does not use the suffix _if. I think it would be good to
> follow the STL convention and change the names accordingly.

As far as I know, this only applies to trim functions. It is possible to rename them,
if it bring some benfits (like clarity).

> Regarding the iterator_range class, then I think it might make the
> documentation a little harder to grasp. Something as general as this could
> definitely be useful on its own (although I would prefer a shorter name,
> like subrange). The basic usage for iterator_range in the string library is
> as a sub-string, and therefor it might be more useful with a substring class
> which provided implicit conversion to basic_string. And the library could
> then provide two typedefs

Again, I strongly disagree with adding some unnecessary affinity of the lib to the
basic_string. It goes fundamentaly againts the basic design principles of this library.

Library defintion of a string is not the "basic_string" rather "an arbitrary sequence
of characters" and it is designed on this principle.

> typedef basic_substring<char> substring; typedef basic_substring<wchar_t>
> wsubstring;
>
> And all algorithms would then specify their interface in terms of
> basic_substring<Ch>. So I guess that I would like to see a substring concept
> instead.
>
> Locales. Is there any reason why a locale need to be a parameter to each
> function? If one want's to use a specific locale in one function call, it
> seems likely that the same locale would be used throught the entire program.
> I'm just wondering if it wouldn't be sufficient with some kind of set_locale
> function which then affected all the algortihms?

If your program uses just one locales at the time, you are perfectly ok with
default values and you can use std::locale::global to set your prefered locale
settings.

However there are applications when there is a need to used more locales simultaneosly.
Such a behaviour is fully supported by C++ standard library, so it is natural
to support it also by the string_algo lib.

Regards,

Pavol


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk