Boost logo

Boost :

From: Sean Parent (sparent_at_[hidden])
Date: 2006-07-18 14:12:11


On Jul 18, 2006, at 6:38 AM, boost-request_at_[hidden] wrote:

>
> Sebastian Redl said: (by the date of Tue, 18 Jul 2006 14:06:59
> +0200)
>
>> Janek Kozicki wrote:
>>
>>> boost::tokenizer<>::iterator pos=tok.begin();
>>> - is this IMO bad design intentional?
>>>
>>>
>> Boost.Tokenizer is written as to model an iterator. Have you even
>> seen
>> an iterator throw an exception on reaching the end of the sequence?
>
> The rationale for container iterators is that iterating over
> containers
> should not be slowed down by exceptions. Those iterators don't even
> check for the iterator's validity! And trying to access beyond the
> container is not detected by the program, but simply causes
> segmentation
> fault. That's good for usual containers.

The requirements of iterators don't come from containers - they come
from algorithms. It is not valid to dereference an end() iterator of
any kind. Introducing a check for end() on each dereference would
incur a redundant check on each call.

>
>> So yes, the design is probably intentional.
>
> However tokenizer::iterator is not a typical container nor typical
> iterator. It is even unusual that it checks for the iterator's
> validity
> (because trying to access beyond the container causes assert
> (valid_) to fail ).

tokenizer::iterator is not a container at all, it is an iterator
adaptor.

Some interesting bits about iterator adaptors. Any context free
grammar can be expressed as an iterator adaptor - or put a different
way, a context free grammar satisfies the requirements of an iterator
adaptor. Tokenizer is just one example of this - if the base iterator
satisfies InputIterator it can be adapted as an InputIterator, if the
base iterator satisfies ForwardIterator it can be adapted as a
ForwardIterator (Bidirectional and RandomAccess will can also be
adapted to Forward). Any InputIterator adaptor can also be adapted to
an OutputIterator adaptor - hence any context free grammar can be
expressed as either (you can tokenize on the way in or out).

Consider expressing it as:

token_iter = std::copy(tok.begin(), tok.end(), std::back_inserter
(my_vector));

Now you can see that an additional check for end() on each step of
the tokenizer is redundant. You might also consider writing a bounded
algorithm:

template<typename I // I models InputIterator
        typename O> // O models OutputIterator
std::pair<I, O> copy_bounded(I in_first, I in_limit, O out_first, O
out_limit)
{
        while (in_first != in_limit && out_first != out_limit)
        {
                *out_first = *in_first;
                ++in_first;
                ++out_first;
        }
        return std::make_pair(in_first, out_first);
}

A collection of such bounded functions would be a nice contribution
to boost (or ASL).

You could also write a bounds checking iterator adaptor which throws
on dereferencing end() [or incrementing past end()] - I would be
careful with such an adaptor with standard algorithms - I'd have to
survey the library to come up with a list but some algorithms are
memory adaptive and not likely to be expecting a failure at these
points. It should be safe though to bounds check output iterators as
they could throw on assignment on the dereference which would be
equivalent to throwing on dereference. Microsoft provides bounds
checking iterators in their library which are intended to improve
security (I very misguided notion but I won't comment further on that
topic) - such iterators simply terminate if the bounds are violated -
for their intended purpose this would be correct behavior.

Sean


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk