Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-07-18 10:20:29
Mathias Gaunard wrote:
> An error policy isn't really enough though, because to do full checks
> you need each iterator to know about the begin and the end of the range
> it's working on which could be avoided altogether when trusting the input.
The idea of an "iterator that knows where its end is" is something that
comes up fairly often; do the Range experts have any comments about it?
I think that in this case an iterator that can be incremented and
dereferenced in some limited way beyond its end would be sufficient.
For example, a std::string normally has a 0 byte beyond its end so that
c_str() can work, so it is safe (for some value of safe!) to keep
advancing a std::string::iterator until a 0 is seen, without looking
for end(). A UTF-8 decoding algorithm that processes multi-byte
characters by continuing until the top bit is not set would safely
terminate in this case.
For iterators that don't offer this sort of behaviour you can provide a
wrapper that knows where end is and returns a sentinel 0 in that case.
Just a thought....
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk