Boost logo

Boost :

Subject: Re: [boost] [rfc] Unicode GSoC project
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-05-15 10:12:12


Scott McMurray wrote:

> I really think UTF-8 should be the recommended one, since it forces
> people to remember that it's no longer one unit, one "character".
>
> Even in Beman Dawes's talk
> (http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/07/filesystem.pdf)
> where slide 11 mentions UTF-32 and remembers that UTF-16 can still
> take 2 encoding units per codepoint, slide 13 says that UTF-16 is
> "desired" where "random access critical".

I don't plan on supporting random access for UTF-16.
UTF-16 is still faster than UTF-8 because UTF-8 requires more complex
decoding.
UTF-16 has only two cases, making it easier to optimize branches under
the likely and unlikely case.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk