Boost logo

Boost :

From: Erik Wien (wien_at_[hidden])
Date: 2004-10-20 13:53:27


"John Maddock" <john_at_[hidden]> wrote in message
> Interesting: funnily enough I've just started experimenting with Unicode
> support for Boost.Regex (based initially on top of ICU, but it could
> equally sit on top of Boost.Unicode or whatever). The first thing I had
> to do was write a bunch of iterators for interconverting between encoding
> forms (I needed Bidirectional Iterators which code conversion facets
> don't/can't provide). So I guess we're all on a similar page here, can
> your encoding converters proved efficient iterator-based interconversion?

Well.. "efficient" is probably not the word I would use ;), yet that is. The
way it is implemented right now, the value_type of a encoded_string iterator
of any encoding is 32bit. (A unicode code-point.) So when iterating over any
encoding, the external interface always looks as a vector of code points.
Consequently you can use iterators from one string (UTF-8) to initialize
another string (UTF-16) and the conversion between the two encodings would
happen automatically. I'm guessing this is something similar to what you
have.

I also have a rather hackish implementation that can provide non-const
(assignable) code point iterators on any encoding. This involves a lot of
trickery with iterators changing the size the container they are iterating
over, and proxy classes as a reference_type in the iterator. (something that
is not allowed (yet) in standard C++, but is in boost) As you can imagine,
this implementation is everything but efficient. Kinda neat though! ;)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk