Subject: Re: [boost] [rfc] Unicode GSoC project
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-05-14 07:45:58
Mathias Gaunard <mathias.gaunard_at_[hidden]> wrote:
> Phil Endecott wrote:
>> This is the recommended encoding for dealing with Unicode.
>> Recommended by who? It's not the encoding that I would normally recommend.
> The Unicode standard, in some technical notes:
> It recommends the use of UTF-16 for general purpose text processing.
> It also states that UTF-8 is good for compatibility and data exchange,
> and UTF-32 uses just too much memory and is thus quite a waste.
From that document:
This document is a Unicode Technical Note. It is supplied
purely for informational purposes and publication does not
imply any endorsement by the Unicode Consortium.
Unicode is the best way to process and store text. While there
are several forms of Unicode that are suitable for processing,
it is best to use the same form everywhere in a system, and to
use UTF-16 in particular for two reasons:
1. The vast majority of characters (by frequency of use) are
on the BMP.
2. For seamless integration with the majority of existing
software with good Unicode support.
I don't find either of those claims very convincing. I hope that your
library will not try to make UTF-16 some sort of default encoding, or
otherwise give it special treatment.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk