Boost logo

Boost :

From: Peter Bindels (dascandy_at_[hidden])
Date: 2006-09-17 12:10:19


On 17/09/06, Aristid Breitkreuz <aribrei_at_[hidden]> wrote:
> Am Samstag, den 16.09.2006, 19:55 +0200 schrieb loufoque:
> > Aristid Breitkreuz wrote :
> [snip]
> > > That's fine. Do you have plans on which Unicode encoding to use
> > > internally?
> >
> > UTF-8, UTF-16 and UTF-32 would all be available for implementations, and
> > each one would be able to take or give the other ones for input/output.
>
> I guess that every single supported type is extra complexity, right?
> Would not UTF-8 (for brevity and compatibility) and UTF-32 (because it
> might be better for some algorithms) suffice?

That's not entirely accurate. UTF-8 is Latin-centric, so that all
latin texts can be processed in linear time, taking longer for the
rest. UTF-16 is common-centric, in that it works efficiently for all
common texts in all common scriptures, except for a few. Choosing
UTF-8 over UTF-16 would make the implementation (and accompanying
software) slow in all parts of the world that aren't solely using
Latin characters. That would be most of Europe, Asia, Africa,
South-America and a number of people in North-America and Australia.
Forcing them to UTF-32 makes for quite a lot worse memory use than
could reasonably be expected. I see quite a lot of use for the UTF-16
case, perhaps even more than the UTF-8 one.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk