Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2006-11-10 05:43:37


Hi,

Martin Adrian wrote:
> I have used the string_algo library for a long time and is used to how it works.
> Recently I had to introduce a collegue to it and started to write a document but
> it wasn't easy.
>

First of all, thanks for comments. I'm glad that somebody takes so much
attention to this library. And sorry for the delay.

I'd like to start my reply with explanation of one misconception
that I have seen in your mail. Many algorithms in this library use a
user suplied functor. But not all functors are predicates as you
incorrectly assumed. Finder and Formatter are typical examples of
other functors used in the library.

Now to you points.

> Here is a few points:
>
> 1. The word predicate is used to describe anything which can return a boolean
> value. While this is technicly correct it creates confusion since there are many
> different predicates used in the library.
> Example: Both trim_if and starts_with takes a PredicateT argument but they are
> not the same predicate types.
>
> I have found the following predicates:
> - Unary token predicate:
> These are all the classification predicates: is_space, is_from_range etc
> - Binary token predicate:
> These are the token comparison predicates: is_equal, is_iless etc
> - Unary string predicate:
> These are the string validation predicates: all, regex_match
> - Binary string predicate:
> These are the substring match predicates: starts_with, contains etc.
>
> Note here that the "string predicates" are different from the "token predicates"
> since they don't return functors (and can't be used directly with standard
> algorithms).
>
> My suggestion is to clear this up in such a way that all predicate types are
> given a name (e.g. UnaryTokenPredicateT) and that they are available as both a
> free function and a boolean functor. Naming will be tricky to get right.
>
> is_space:
> functor: boost::is_space( Locale )
> free function: std::is_space( Char, Locale )
> is_iequal:
> functor: boost::is_iequal( Locale )
> free function: ??(Char1, Char2, Locale)
> starts_with:
> functor: ??(String, String)
> free function: starts_with(String, String)
>

This is a valid claim. However, it is always quite clear from the
algorithm description, what predicate can be used. In addition, not all
predicates in the library are just utilities for other algorithms.

Even in the algorithms that accept a predicate, user is free to use
her own version, not necessary something from the string_algo library.

If you look through documentation, you will see, that there are no
algorithm that use "string predicates". Actualy, only two kind of
paremetric predicates are used: "Character classification" and
"Character comparison". I think, that it is quite clear which predicate
to use with a given algorithm. Even if a confusion would appear,
compiler will issue an error imediately, since these two categories
have different signatures.

> 2. There are a number of suffixes used within the library. Here is my
> interpretation:

Let me correct this:

>
> nosuffix: Mutable verision

no suffix means a "default". Not all algorithms perform some
transformation (like afordmentioned predicates). However, as you
correctly stated, in transformation algorithms, mutable version is
default

> _copy: const version returning a copy

this is correct

> _if: version using a (non-default) predicate

_if suffix is used only when it is impossible to use standard
overloading rules to distinguish between algorithms. Currently
there are only two cases: trim and join.

> _regex: version using a regex predicate

This is not correct. _regex suffix designates a regex variant of
algorithm. There is nothing like "a regex predicate".

> _first, _last, _nth, _all, _head, _tail: for search/replace/erase algorithms
> _range: Not sure what this is supposed to mean
>

All these suffixes define what kind of search operation is used. Or in
other words, what part of the string will be affected.

_range suffix means, that an algorithm will now perform an actual
searching, rather the 'working range' will be specified explicitly by
the user.

This is usefull, for instance, if you have a special algorithm that
performs a "search" and you just want to use string-algo to perform
replace .

> Most are fine but there are some inconsistances:
> - find/replace/erase algorithms can work with 2 different predicates (token
> predicate and Finder). The naming isn't consistant.
> find_all takes a token predicate while make_find_iterator uses a Finder.
> - trim taking a unary token predicate is called trim_if but find taking the same
> argument is called find_token
> - replace with predicate is called find_format
> - join taking a regex argument is called join_if
> - starts_with etc use overload instead of _if suffix

Whole this paragraph is very confusing. As I stateed at the beginnig,
you are mixing together functors with predicates. They are not the same.
Finder does not even return bool.

join exists in the variants. plain 'join' simply concatenates all
elemets from the input container. 'join_if' excludes those that do
not satisfy the given condition. There are actualy two variants of
'join_if' One that takes an arbitrary predicate and the other, that
accepts a regular expression.

>
> 3. There are lot of (randomly) missing algorithms.
> - erase_if, erase_all_if, erase_copy_if, erase_all_copy_if taking a Finder
> argument (all these are availble as replace with _formatter suffix)

erase* and replace* algorithms are specializations of find_format
provided for convenience. They simply construct a finder and a formatter
from the input paremeters and forward the call to the find_format.

There is already quite large number of algorithm and I see no reason why
to provide additional variants that are easily accesible.

No we have:

replace_first(input, str,str)
erase_first(input, str)

that both forward to
find_format(input, finder, formatter)

If you want erase_if (btw. _if is totaly incosistent here, since there
is no predicate here), you can simply use
find_format(input, finder, empty_formatter(input)).

> - is_token, is_itoken, is_ianyof

What is is_token supposed to do?

is_ianyof might be usefull.

> - ifirst_finder, ilast_finder, inth_finder

finder constructors are considered to be "advanced stuff" therefore
they are limitided to basic functionality. I don't mean to bloat
interface with additional "convenience" wrappers.

> - find_all and split taking a finder argument

These algorithms actually exists. They are called iter_find and
iter_split. Currently they are held as an implementation layer, but
due to several requests from users, I will move them higher. This meas,
that they will be also documented.

> - make_split_iterator and make_find_iterator with token predicate

Use token_finder here. Convenience layer for find_iterators is presented
in the form of find_all and split algorithms.
This might change in the future, but I don't have concrete plans for this.

>
> 4. Other stuff
> - ilexicographical_compare is missing locale argument

This is obviosly a bug. Thans for report.

> - range_finder seems odd. What is the purpose?

See above.

> - why is the predicate called not_greater instead of less_equal (compare to
> std::less_equal)

Not sure. I think, that both namings are equaly good.

> - regex_search_result is derived from iterator_range but iterator_range doesn't
> have a virtual destructor. It might be safe but I couldn't verify it.

Due to templated nature of the implementation, regex_search_result is
never use indirectly as iterator_range. Therefore there is no problem
that I'm aware of.

> - there is no special string_algo "regex_match" predicate. The regex version is
> limited to basic_string argument.
>

Regex part of string-algo library was never considered to be a full
replacement of Boost.Regex. So it is natural, that some parts of
functionality are missing.

There were already some requests to add regex awareness to string
predicates, so a kind "regex_match" predicate migh appear there.

Best Regards,
Pavol


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk