Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2004-06-07 05:25:54


Hi

Sunday, June 6, 2004, 10:32:10 PM, you wrote:

[snip]

> I don't propose to narrow implementation only to std::string. My point is:

> String algorithm always take string as a whole. And if I do not want it,
> there should not be single ::iterator or .begin() assumed by interface to be
> used by user.

I don't know if I understand you correctly. User will almost never be
forced to use .begin(). Collection traits are incorporated in the
whole lib, so .begin() and etc. is used automaticaly, depending on the
type of the collection.
::iterator is used only once in the defintion of the find_iterator
type. It it only a single typedef, that user has to use.
There are more options then char or wchar_t, and it is up to the user
to decide. For instance, you can iterate over const_iterator of
mutable iterator. That's the reason why these typedef are not part of
the library. If it would be requested, I can include most common ones.

>> And for the iterator range: it provides a reference into the original
>> string.
>> It if most efficient way to doing so, because you don't pay anything if
>> you
>> want just read the content there. For other operation you can simply use
>> boost::copy_iterator_range, to extract the match to your favorite
>> container.
>>
> I agree that we need something to efficiently model substring. IMO something
> like basic_cstring is better choice for use with string algorithms. It may
> be less generic than iterator_range, but should be generic enough to cover
> 99% cases.

I have checked your basic_cstring. IMHO it is not usable in the
context of the string_algo lib. It is restricted to char* (I
definitely need iterators). It bundles algorithms inside a class -
this violates the decapulated design policy. I don't realy know what
do you have agains iterator_range. It is used only as a reference to
the searched input. It is up to the user what will do with this
reference. Why whould we want to enforce the user to use something,
she does not want to?
Can you please be more specific, what features/use-cases are you
lacking with the iterator_range approach when compared to
basic_cstring?

>> > 2. Finder concept maybe useful in implementing different string search
>> > algorithm. BUT as implementation detail or generic case solution. I do
>> > not
>> > see any reason for interface that assumes explicit specification of
>> > finders
>> > provided by the library. There should be a generic algorithm/iterator
>> > that
>> > expect User defined finder, but there should be explicit
>> > algorithms/iterators dedicated for each type of search. I do that for
>> > algorithms, why not for iterators?
>> >
>>
>> They are there actually for most of the algorithms.
>> find/split_iterator is quite new stuff. Originally it has been
>> encapsulated by
>> find_all and split algorithms (they are still there, see split.hpp).
>> During the review it has
>> been expressed, that pure iterator-base interface is better for split
>> operations.
>> Therefore I have refactored the implementation so it can be used directly.
>>
> IMO to be convenient you should've split iterators on many as you did with
> algorithms. Also you can keep generic find_iterator for use with custom
> finder.

I will do this. Actualy it is not needed to specify a class for every
type of iterator. Just a constructor functions.

>> > > find_iterator is also based on generic iterator concept, however, the
>> > > iterator is the only
>> > > template parameter, so you can easily specialize for string.
>> > >

>> > > Simple usage looks like this:
>> >
>> > In fact from your sources it seems that I have to write like this
>> >
>> > boost::split_iterator<std::string::iterator> it( str.begin(), str.end(),
>> > boost::token_finder(boost::is_any_of(";,"));
>> >
>>
>> > IMO it's way to verbose, while still missing some of the functionality
>> > provided by the tokenizer library.
>> >
>>
>> Actually you have missed the important constructor. You can write:
>>
>> boost::split_iterator<std::string::iterator> it( str,
>> boost::token_finder(boost::is_any_of(";,"));
>>
>> str will be expanded automatically. And you can also use char*, wchar_t of
>> vector<char>.
>>
> Did I? Could you point on file:line?

I'm sorry, I'm starting to have a little mess. I forgot, that I
haven't commited the latest update. It is there now.

> Even this is too much IMO. There should not be ::iterator.
> I prefer boost::basic_split_iterator<char> ( = split_iterator ).

And how do you distinguish between "const_iterator" and "iterator" ?
And how can you distinguish between iteration over vector<char>,
rope<char> or basic_string<char>. Or even (with a little adaptation in
collection_traits) CString?

>> > My choice:
>> >
>> > boost::token_iterator it( str, boost::dropped_delimiters = ";," );
>>
>> It is very easy to implement forwarder to this kind of syntax. As I said,
>> find_iterator
>> is relatively new, so I have extended the interface to full extend yet.
>> I will keep this in mind. Thanks for idea.
>>
> It should be separate class for every predefined finder IMO. Also keep in
> mind that boost::tokenizer provide additional functionality that is not
> covered by above finder.

I didn't mean to mimic tokenizer. Although I will try to incorporate
as much of its functionality as possible. As I have written above,
there is not a need for a separate class for every type of finder.

>> > Second concern about your solution, that makes it in a sense even worse
>> > than
>> > the one provided by boost::tokenizer, is that you actually does not
>> > specify
>> > type of the Finder in the iterator specification. The price is that
>> > eventually you have to, this way or another, pay with runtime overhead.
>> > This
>> > would be unacceptable to me.
>>
>> This is a design point. First implementation had templated specification
>> of the finder.
>> However, the usage of such a class was very inconvenient due to
>> complicated specification.
>> Therefore it was changed to the current state. The runtime overhead
>> implied
>> by the current implementation is rather small. I actually only adds one
>> indirection in
>> the increment operation. So it is neglible.
>>
> You had to make this decision namely because you are trying to pull several
> independent (though similar) algorithms under the same interface hood and do
> not want burden of explicit type Finder type specification. But what is so
> common among these iterators to do so? On the other hand, in many cases all
> that the increment does is a couple comparisons. There is chance it could be
> inlined. Now you need to pay for function invocation.

You might have a look into the implementation. There is realy only
one additional dereference per increment. Finding operation is usualy at least
O(n), so one far jmp does not bring any degradation in performance.
find_iterator needs to consult the finder only there. Local state is
stored directly in the iterator so comparisons and dereference have
no additional overhead.

>> > There are couple of things that your solution provide in addition to
>> > what my
>> > and boost::tokenizer solution provide. Particularly it's an ability to
>> > specify several addition delimitation policies. I thought about this.
>> > But
>> > it isn't on top of my priorities (mostly because it is comparatively
>> > rarely
>> > needed). In any case this generic solution should not interfere with
>> > most
>> > convenient one used in majority of the cases.
>>
>> Rationale in find_iterator is quite simple. There is a well defined
>> concept of Finder
>> in the lib and there is also a bunch of finders already implemented there.
>> find_iterator
>> is a natural candidate to use this facility.
>>
> I don't mind find_iterator by itself. But there should be separate
> token_iterator with different tradeoffs: It more specific (less flexible),
> but more convenient and efficient.

Maybe thats the direction where boost::tokenizer should head.
find_iterator is an extension to Finder. If you have a bunch of
finders, it provides a useful functionality. However, I'd like to
offer as much of usablity and convenience as possible.

Regards,

Pavol


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk