Boost logo

Boost :

From: Aleksey Gurtovoy (alexy_at_[hidden])
Date: 2000-08-24 11:22:42


----- Original Message -----
From: <jbandela_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Wednesday, August 23, 2000 9:32 AM
Subject: [boost] Re: Interest in a token iterator???

> I had thought about doing it. However, it seems the main benefit
> would be when using istream_iterators.

The main benefit is that with such interface you could use a pair of
iterators of *an arbitrary type* to specify an input for your token
iterator; you would not be forced to use 'std::string' or any other
container-like object to specify just an input sequence to be tokenized;
IMO, pair of iterators is much more natural way to do it and it's more
generic.

> Other than that, a char buffer
> is easily converted to a string.

Sometimes an overhead of that conversion (copying) may be unacceptable or
data may be just too big to place it in memory (and you don't want to
process it in chunks).

> The problem with istream_iterators
> is that they are input iterators. If you have two input iterators
> referencing the same sequence, modifying one, modifies the other.
> Thus, if token iterator was implemented in terms of an input
> iterator. Having independent copies of the iterator that reference
> the same character sequence would be impossible. This would mean that
> you could not pass token iterators into algorithms such as copy or
> find, without having your original token iterators modified.

Strictly speaking, that's not true ;) Your statement is correct only for
'token_iterator< istream_iterator<...> >', or whatever other
'token_iterator<>' parameterized by an iterator type which satisfies only
input iterator requirements. But in that case you are getting what you've
asked for.

I think what you need is a proper definition of the concept, which states
that the iterator category of the token iterator depends on the iterator
category of iterators used to iterate through original input sequence - e.g.
'token_iterator<std::string::const_iterator>::iterator_category' can be
'std::forward_iterator_tag'.

> This
> could also seriously affect algorithms that depend on lookahead
> features (ie = vs == in C/C++).

If you accept my definition above, it will not (or it will only in case if
you want so :).

> Finally, there is the ownership
> issue. If the sequence is modified or deleted, the token iterators
> could become corrupt.

I don't think that's a problem. After all, if you modify or delete a vector,
all its iterators become invalid too :)

> Based on all this, I decided to use strings
> that the token iterator owned. In addition, since the string is never
> modified, the string is reference counted and shared between all
> copies of a token iterator. This makes copying them pretty cheap.
>

Sorry, I don't like this. First, you create a copy of the StringType
parameter on the heap, which may be quite expensive. Second, using a
reference counting may be a show stopper for possible users of the class who
have to deal with multi-threading (it's not thread-safe, isn't it? ;).
Third, IMHO, the problems you were trying to solve might be not indeed a
problems, so we can get rid of all these complications if you agree with my
points.

Does it make any sense to you?

--Aleksey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk