Boost logo

Boost :

From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-19 12:58:55


In article <cl3hl9$g4e$1_at_[hidden]>, "Erik Wien" <wien_at_[hidden]> wrote:

> The basic idea I have been working around, is to make a nencoded_string
> class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made
> possible through a encoding_traits class which contains all nececcary
> implementation details for working on strings of code units.

I generally agree with this design approach, but I don't think that code point
iterators alone are sufficient. Iteration over encoded characters and abstract
characters would be needed for some algorithms to function sensibly. For
example, the simple task of:

find(begin, end, "ü")

needs to use abstract characters in order to be able to find precomposed and
decomposed versions of ü.

> You could use the encoded_string class like this:
>
> // Constructor converts the ASCII string to UTF-16.
> encoded_string<utf16> some_string("Hello World");
> // Run some standard algorithm on the string:
> std::for_each(some_string.begin(), some_string.end(), do_some_operation);

Again, taking this example, you let's say that do_some_operation performs
canonicalization to some Unicode canonical form; you can't do this by iterating
over code points.

> I am aware that this implementation will be less that ideal for integration
> with the current c++ standard, but it's issues like that I would like to get
> deeper into during the develpoment.

You should explain what problems with integration you foresee.

meeroh


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk