Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2005-08-22 09:15:33


On 8/22/05 9:32 AM, "Graham" <Graham_at_[hidden]> wrote:

> From: Daryle Walker <darylew_at_[hidden]>
[SNIP]
> Maybe I misunderstood you - you seem to be asking for code page conversion
> here not UTF conversion.

Yes. Because it seems that the Standard does not _acknowledge_ UTF
conversion!

> Code page conversion is anything but simple and requires data conversion
> tables for many of the code pages. It is not very practical at the character
> level as you can see from the calls below, but would require conversion
> objects to be created that can then implement these functions.
>
> int_fast32_t char_to_Unicode(std::string locale, char c );
> // very inefficient
> int_fast32_t wchar_to_Unicode(std::string locale, wchar_t c );
> // very inefficient
>
> conv1250 = new convertor(std::string locale); // better
> convertor->char_to_Unicode(char c);
>
>> [1] In other words, any extended native character (i.e. not a character C++
>> uses for parsing) must be mapped to one C++ Unicode name, which maps to a
>> single code-point.
>
> That depends on what you mean. For example <e><acute> can be one <e acute>, or
> two <e><acute> Unicode characters depending on how it is normalised.
> A character parser should understand this if it wants to present Unicode
> graphemes which are the default unit of parsing in unicode and which should be
> in any compliant native handler even if working with the local code page.
[TRUNCATE]

The issue that I discovered is that the Standard isn't compatible with
characters (either native or Unicode) that take up multiple code-points.

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk