Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-07-20 12:50:31
Rogier van Dalen wrote:
> Non-checking iterator adaptors can be faster. That would be useful
> when you know that a string is safe, for example, in a UTF string type
> that has a validity invariant.
I suppose that type of string should probably use optimized iterators
that make use of the fact it is stored on contiguous and properly
aligned memory anyway, so it will need special code.
> I think this means that all iterator adaptors can be constructed from
> 3 iterators (begin, position, end) and the ones that don't check the
> input can also be constructed from 1 iterator. For a checking forward
> iterator, only two iterators are necessary (position, end). This is
> how I implemented this, at any rate.
Indeed, that makes 3 cases per encoding and I'm only handling the
broadest case for now.
> It makes sense to design for correctness. It's probably worth keeping
> in minds, though, whether conceivable extensions and optimisations are
> possible in your design.
I suppose you could attach traits to select more optimal iteration methods.
> I like the idea of the Pipe and related concepts. I am wondering,
> however, whether the UTF-8 decoding iterator can be fast enough given
> the current specification. I think Pipe (or another concept) might
> have to support decoding of exactly one output element. Correct me if
> I'm wrong.
I don't really understand what you mean.
Calling Pipe::ltr or Pipe::rtl only decodes one "element" (utf8 decoding
means a multibyte sequence is read and a code point is written, utf8
encoding means a code point is read and a multibyte sequence is written).
> The actual implementation of extensions and optimisations can be
> delayed until the need appears. I'd be happy to contribute checking
The mechanism to do so has yet to be defined unfortunately ;).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk