Boost logo

Boost :

From: Gennadiy Rozental (gennadiy.rozental_at_[hidden])
Date: 2004-06-06 15:32:10


> > 1. Your library is dedicated to string algorithms. Why is it use range
if
> > iterators so frequently then? As an input and an output. I did found
that
> > having range of it iterators as a model of substring is very convenient.
> > That what my basic_cstring is used for (actually, I may say that in my
> > development this is single most widely used class). But it is
full-fledged
> > string as good as std::basic_string. And it does not based in iterators,
but
> > on character type.
> > Iterator range does may be usable, BUT outside of string algo
library.
>
> It was clearly stated during the review the "string" is not equal to
std::basic_string.
> The string algorithm library is not std::basic_string extension, rather it
> is a set of string-related algorithms. However with the help of
collection/range
> traits it provides almost the same level of convenience as if it would
> be specialized just for a single string class.
>
> Narrowing the implementation would be a giant leap back.

I don't propose to narrow implementation only to std::string. My point is:

String algorithm always take string as a whole. And if I do not want it,
there should not be single ::iterator or .begin() assumed by interface to be
used by user.

> And for the iterator range: it provides a reference into the original
string.
> It if most efficient way to doing so, because you don't pay anything if
you
> want just read the content there. For other operation you can simply use
> boost::copy_iterator_range, to extract the match to your favorite
container.

I agree that we need something to efficiently model substring. IMO something
like basic_cstring is better choice for use with string algorithms. It may
be less generic than iterator_range, but should be generic enough to cover
99% cases.

> > 2. Finder concept maybe useful in implementing different string search
> > algorithm. BUT as implementation detail or generic case solution. I do
not
> > see any reason for interface that assumes explicit specification of
finders
> > provided by the library. There should be a generic algorithm/iterator
that
> > expect User defined finder, but there should be explicit
> > algorithms/iterators dedicated for each type of search. I do that for
> > algorithms, why not for iterators?
> >
>
> They are there actually for most of the algorithms.
> find/split_iterator is quite new stuff. Originally it has been
encapsulated by
> find_all and split algorithms (they are still there, see split.hpp).
During the review it has
> been expressed, that pure iterator-base interface is better for split
operations.
> Therefore I have refactored the implementation so it can be used directly.

IMO to be convenient you should've split iterators on many as you did with
algorithms. Also you can keep generic find_iterator for use with custom
finder.

> > > find_iterator is also based on generic iterator concept, however, the
> > > iterator is the only
> > > template parameter, so you can easily specialize for string.
> > >

> > > Simple usage looks like this:
> >
> > In fact from your sources it seems that I have to write like this
> >
> > boost::split_iterator<std::string::iterator> it( str.begin(), str.end(),
> > boost::token_finder(boost::is_any_of(";,"));
> >
>
> > IMO it's way to verbose, while still missing some of the functionality
> > provided by the tokenizer library.
> >
>
> Actually you have missed the important constructor. You can write:
>
> boost::split_iterator<std::string::iterator> it( str,
boost::token_finder(boost::is_any_of(";,"));
>
> str will be expanded automatically. And you can also use char*, wchar_t of
vector<char>.

Did I? Could you point on file:line?
Even this is too much IMO. There should not be ::iterator.

I prefer boost::basic_split_iterator<char> ( = split_iterator ).

> > My choice:
> >
> > boost::token_iterator it( str, boost::dropped_delimiters = ";," );
>
> It is very easy to implement forwarder to this kind of syntax. As I said,
find_iterator
> is relatively new, so I have extended the interface to full extend yet.
> I will keep this in mind. Thanks for idea.

It should be separate class for every predefined finder IMO. Also keep in
mind that boost::tokenizer provide additional functionality that is not
covered by above finder.

> > Second concern about your solution, that makes it in a sense even worse
than
> > the one provided by boost::tokenizer, is that you actually does not
specify
> > type of the Finder in the iterator specification. The price is that
> > eventually you have to, this way or another, pay with runtime overhead.
This
> > would be unacceptable to me.
>
> This is a design point. First implementation had templated specification
of the finder.
> However, the usage of such a class was very inconvenient due to
complicated specification.
> Therefore it was changed to the current state. The runtime overhead
implied
> by the current implementation is rather small. I actually only adds one
indirection in
> the increment operation. So it is neglible.

You had to make this decision namely because you are trying to pull several
independent (though similar) algorithms under the same interface hood and do
not want burden of explicit type Finder type specification. But what is so
common among these iterators to do so? On the other hand, in many cases all
that the increment does is a couple comparisons. There is chance it could be
inlined. Now you need to pay for function invocation.

> > There are couple of things that your solution provide in addition to
what my
> > and boost::tokenizer solution provide. Particularly it's an ability to
> > specify several addition delimitation policies. I thought about this.
But
> > it isn't on top of my priorities (mostly because it is comparatively
rarely
> > needed). In any case this generic solution should not interfere with
most
> > convenient one used in majority of the cases.
>
> Rationale in find_iterator is quite simple. There is a well defined
concept of Finder
> in the lib and there is also a bunch of finders already implemented there.
find_iterator
> is a natural candidate to use this facility.

I don't mind find_iterator by itself. But there should be separate
token_iterator with different tradeoffs: It more specific (less flexible),
but more convenient and efficient.

> I'll be grateful if you can show me where do you think, the string_algo
lib is
> lacking in the usability. Preferably, provide also codesnippes how it is
and how it
> should look like. Both sides can benefit from improvements.

I thought I did.

> Regards,
>
> Pavol

Regards,

Gennadiy.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk