Boost logo

Boost :

Subject: Re: [boost] [rfc] Unicode GSoC project
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-05-14 07:45:58


Hi Mathias,

Mathias Gaunard <mathias.gaunard_at_[hidden]> wrote:
> Phil Endecott wrote:
>> UTF-16
>> ....
>> This is the recommended encoding for dealing with Unicode.
>>
>> Recommended by who? It's not the encoding that I would normally recommend.
>
> The Unicode standard, in some technical notes:
> http://www.unicode.org/notes/tn12/
> It recommends the use of UTF-16 for general purpose text processing.
>
> It also states that UTF-8 is good for compatibility and data exchange,
> and UTF-32 uses just too much memory and is thus quite a waste.

 From that document:

     Status

     This document is a Unicode Technical Note. It is supplied
     purely for informational purposes and publication does not
     imply any endorsement by the Unicode Consortium.

     ....

     Conclusion

     Unicode is the best way to process and store text. While there
     are several forms of Unicode that are suitable for processing,
     it is best to use the same form everywhere in a system, and to
     use UTF-16 in particular for two reasons:

        1. The vast majority of characters (by frequency of use) are
        on the BMP.
        2. For seamless integration with the majority of existing
        software with good Unicode support.

I don't find either of those claims very convincing. I hope that your
library will not try to make UTF-16 some sort of default encoding, or
otherwise give it special treatment.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk