Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Alexander Lamaison (awl03_at_[hidden])
Date: 2011-01-14 09:42:25
> On Fri, 14 Jan 2011 10:42:44 +0100, Matus Chochlik wrote:
> > Hi,
> On Thu, Jan 13, 2011 at 8:21 PM, Artyom <artyomtnk_at_[hidden]> wrote:
> > Hello All,
> > I wanted to talk about it for a loooooong time.
> > however never got there.
> > -------------------------------------------------
> > Proposal Summary:
> > ===================
> > - We need to treat std::string, char const * as
> > Â UTF-8 strings on Windows and drop a support of
> > Â so called ANSI API.
> > - Optuional but recommended:
> > Â Deprecate wide strings as unportable API.
> Fully agree. Two years ago I would very probably be advocating
> some kind of TCHAR/wxChar/QChar/whatever-like character type
> switching, but since then I've spent a lot of time developing portable
> GUI applications and found out the hard way that it is better
> to dump all the ANSI CPXXXX / UTF-XY encodings and stick
> to UTF-8 and defer the conversion to whatever the native API
> uses until you make the actual call.
I'm opposed to this strategy simply because it differs from the way
existing libraries treat narrow strings. Not least the STL. If you open
an fstream with a narrow filename, for instance, this isn't treated as a
UTF-8 string. It's treated as being in the local codepage.
What the Visual Studio implementation of the STL actually does is pretty
much the same as how Boost.Filesystem v3 treats paths:
It uses mbstowcs_s to convert the narrow string to the wchar_t form and
then uses _wfsopen to open the file. Importantly, mbstowcs_s treats the
narrow string as being in the local codepage which on Windows _won't_ be
UTF-8. If you tried to open an fstream by handing it a UTF-8 encoded
string, you would end up with severe problems.
For shits and giggles I tried to open a std::fstream with
"×©×××-Ø³ÙØ§Ù -pease-ÐÐ¸Ñ.txt" as the filename. What it ends up doing is
creating a file called "ÃÂ©ÃÂÃÂÃÂ-ÃÂ³ÃÂÃÂ§ÃÂ -pease-ÃÂÃÂ¸ÃÂ.txt"!
While this behaviour isn't great, it is standard. I don't think we should
make boost produce UTF-8 narrow string on Windows. A programmer would
expect to be able to take such a string and pass it to STL functions. As
you can see, that wouldn't work.
-- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk