|
Boost : |
Subject: Re: [boost] [General] Always treat std::strings as UTF-8? (was [Process] List of small issues)
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-13 12:17:05
On Thu, 13 Jan 2011 06:35:53 -0800 (PST)
Artyom <artyomtnk_at_[hidden]> wrote:
[...]
> Notes:
>
> 1. You can also always assume that strings under windows are UTF-8
> and always convert them to wide string before system calls.
>
> This is I think better approach, but it is different from what
> most of boost does.
[...]
An interesting thought... I developed a set of ASCII/UTF-8/16/32
classes for my company not too long ago, and I became fairly familiar
with the UTF-8 encoding scheme. There was only one issue that stopped
me from assuming that all std::string types as UTF-8-encoded: what if
the string *isn't* meant as UTF-8 encoded, and contains characters with
the high-bit set?
There's nothing technically stopping that from happening, and there's
no way to determine with complete certainty whether even a string that
seems to be valid UTF-8 was intended that way, or whether the UTF-8-like
characters are really meant as their high-ASCII values.
Maybe you know something I don't, that would allow me to change it? I
hope so, it would simplify some of the code greatly.
-- Chad Nelson Oak Circle Software, Inc. * * *
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk