Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: John B. Turpish (jbturp_at_[hidden])
Date: 2011-01-14 16:34:45


On Fri, Jan 14, 2011 at 1:36 PM, Alexander Churanov
<alexanderchuranov_at_[hidden]> wrote:
> John,
>
> As I understand the choice is between UTF-8 and UTF-16, since UTF-32
> is a waste of memory. Given that, there is never fixed size for a
> character or linear times - both UTF-8 and UTF-16 are variable-size
> encodings of UTF-32.

Yes, my comment was in response to a comment about UTF-32 as
pertaining to an internal encoding. I'd only use UTF-16 if the APIs I
used required it, and the conversion could be done at the interface
(for example in a fascade). What interests me is if there's a good
reason to use UTF-8 internally and give UTF-32 the same treatment as
UTF-16, or vice versa. I do find the simplicity of a fixed-width
encoding alluring.

By the way, I disagree with Peter's assessment that, "you rarely, if
ever, need to access the Nth character," but I will gladly cede that
this depends on your problem domain.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk