Boost logo

Boost :

Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2009-07-20 09:30:46


On Sat, Jul 18, 2009 at 15:20, Phil
Endecott<spam_from_boost_dev_at_[hidden]> wrote:
> The idea of an "iterator that knows where its end is" is something that
> comes up fairly often; do the Range experts have any comments about it?
>
> I think that in this case an iterator that can be incremented and
> dereferenced in some limited way beyond its end would be sufficient.  For
> example, a std::string normally has a 0 byte beyond its end so that c_str()
> can work, so it is safe (for some value of safe!) to keep advancing a
> std::string::iterator until a 0 is seen, without looking for end().  A UTF-8
> decoding algorithm that processes multi-byte characters by continuing until
> the top bit is not set would safely terminate in this case.

(I don't think I'm a Range expert.) I think there are problems with
this example. Adding '\0' at the end is not mandated by the standard,
right? Also, '\0' could also occur in the middle of the sequence.
Also, I don't think iterators strictly speaking allow dereferencing
the past-the-end element.

> For iterators that don't offer this sort of behaviour you can provide a
> wrapper that knows where end is and returns a sentinel 0 in that case.

Wouldn't this end up requiring two if-statements? In general, a
sentinel which is in the valid range of the value type would be an
awkward sentinel.

However, I can see where you're coming from. Being able to tell from
an iterator whether it's at the end of its range is often useful. Is
operator*() is the right place to implement this functionality?
Wouldn't a free function
    is_at_end(Iterator)
make more sense?

Cheers,
Rogier


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk