From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-19 12:58:55
In article <cl3hl9$g4e$1_at_[hidden]>, "Erik Wien" <wien_at_[hidden]> wrote:
> The basic idea I have been working around, is to make a nencoded_string
> class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made
> possible through a encoding_traits class which contains all nececcary
> implementation details for working on strings of code units.
I generally agree with this design approach, but I don't think that code point
iterators alone are sufficient. Iteration over encoded characters and abstract
characters would be needed for some algorithms to function sensibly. For
example, the simple task of:
find(begin, end, "ü")
needs to use abstract characters in order to be able to find precomposed and
decomposed versions of ü.
> You could use the encoded_string class like this:
> // Constructor converts the ASCII string to UTF-16.
> encoded_string<utf16> some_string("Hello World");
> // Run some standard algorithm on the string:
> std::for_each(some_string.begin(), some_string.end(), do_some_operation);
Again, taking this example, you let's say that do_some_operation performs
canonicalization to some Unicode canonical form; you can't do this by iterating
over code points.
> I am aware that this implementation will be less that ideal for integration
> with the current c++ standard, but it's issues like that I would like to get
> deeper into during the develpoment.
You should explain what problems with integration you foresee.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk