Date: 2001-10-18 08:40:10
From: "George A. Heintzelman" <georgeh_at_[hidden]>
> I was spending some time staring at my profiling results, and found
> something a little bit disturbing in the tokenizer.
> The token_iterators ultimately contain a copy of the policies
> which contains the function which decides where the tokens begin and
> end. Some manner of reference to the function is necessary, of
> However, the iterator contains the policy by value. The policy is
> allowed to have state.
> However, there is no distinguishing between the aspects of the
> which are constant and unchanging, as with the strings used to hold
> separator lists in char_delimiters_separator and
> and the aspects which really are stateful, as with the
> in the offset_separator.
> So, that means that whenever I copy a token_iterator, I wind up
> two unnecessary string copies.
> Most of the STL algorithms take iterators by value, sometimes
> recursively, assuming that iterators are cheap to copy....
I've had the same concerns. I've found that on my implementation
sizeof(token_iterator<char_delimiters_separator>) == 44. This
is really gross.
I've ended with the solution with low cost of copying, at
the expense of "degradation" of token iterator from
ForwardIterator to InputIterator. The "degradation" occurs
because the tokenizer function instance and the current token
are kept in the state object that is shared between iterator
instances. As usual, the "additional level of indirection"
approach (original code slightly edited for brevity):
class token_stream : TokenizerFunc
friend struct iterator;
struct iterator : std::iterator<std::input_iterator_tag, Type>
Type const& operator*() const
Type const* operator->() const
state = 0;
bool operator==(iterator rhs) const
assert((!state && !rhs.state) || (state && !rhs.state) || (!state
&& rhs.state) || (state == rhs.state));
return state == rhs.state;
bool operator!=(iterator rhs) const
TokenizerFunc const& f = TokenizerFunc())
Container const& c)
Container const& c,
TokenizerFunc const& f)
m_valid = (m_begin != m_end)? func()(m_begin, m_end, m_tok) : false;
return m_valid = func()(m_begin, m_end, m_tok);
Type const& dereference() const
I've had no problems with this approach. Of cause, there are
some life-time issues (the life-time of the token_stream must
exceed the life-time of its iterators), but the situation is
the same with any container. The only real problem is that
iterators are no longer ForwardIterators. I *think* that this
problem can be solved by redesigning the library to
distinguish between const and mutable tokenization state, but
I have to admit that I haven't put much thought in this direction.
Thinking a bit more, in what usage scenarios token iterators will
be passed to algorithms that require ForwardIterators? Does any
one have some examples? I haven't run into such cases so far.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk