Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Alexander Lamaison (awl03_at_[hidden])
Date: 2011-01-14 09:42:25


> On Fri, 14 Jan 2011 10:42:44 +0100, Matus Chochlik wrote:
>
> > Hi,
>
> On Thu, Jan 13, 2011 at 8:21 PM, Artyom <artyomtnk_at_[hidden]> wrote:
> > Hello All,
> >
> > I wanted to talk about it for a loooooong time.
> > however never got there.
> >
> > -------------------------------------------------
> >
> >
> > Proposal Summary:
> > ===================
> >
> > - We need to treat std::string, char const * as
> >  UTF-8 strings on Windows and drop a support of
> >  so called ANSI API.
> >
> > - Optuional but recommended:
> >
> >  Deprecate wide strings as unportable API.
>
> Fully agree. Two years ago I would very probably be advocating
> some kind of TCHAR/wxChar/QChar/whatever-like character type
> switching, but since then I've spent a lot of time developing portable
> GUI applications and found out the hard way that it is better
> to dump all the ANSI CPXXXX / UTF-XY encodings and stick
> to UTF-8 and defer the conversion to whatever the native API
> uses until you make the actual call.

-1

I'm opposed to this strategy simply because it differs from the way
existing libraries treat narrow strings. Not least the STL. If you open
an fstream with a narrow filename, for instance, this isn't treated as a
UTF-8 string. It's treated as being in the local codepage.

What the Visual Studio implementation of the STL actually does is pretty
much the same as how Boost.Filesystem v3 treats paths:

It uses mbstowcs_s to convert the narrow string to the wchar_t form and
then uses _wfsopen to open the file. Importantly, mbstowcs_s treats the
narrow string as being in the local codepage which on Windows _won't_ be
UTF-8. If you tried to open an fstream by handing it a UTF-8 encoded
string, you would end up with severe problems.

For shits and giggles I tried to open a std::fstream with
"שלום-سلام-pease-Мир.txt" as the filename. What it ends up doing is
creating a file called "שלום-سلام-pease-Мир.txt"!

While this behaviour isn't great, it is standard. I don't think we should
make boost produce UTF-8 narrow string on Windows. A programmer would
expect to be able to take such a string and pass it to STL functions. As
you can see, that wouldn't work.

Alex

-- 
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk