Subject: Re: [boost] Environment Variables Library?
From: Peter Dimov (lists_at_[hidden])
Date: 2015-05-23 09:50:13
BjÃ¸rn Roald wrote:
> I think encoding is going to be a challenge.
> On Posix I think you are right that one can assume the character encoding
> is defined by the system and that may be a multi or a single byte
> character strings, whatever is defined in the locale.
On POSIX, the system doesn't care about encodings. You get from getenv
exactly the byte string you passed to setenv.
> File paths in Windows are stored in double byte character strings encoded
> as UCS-2 which is fixed width 2 byte predecessor of UTF-16.
No, file paths on Windows are UTF-16.
I'm not quite sure how SetEnvironmentVariableA and SetEnvironmentVariableW
interact though, I don't see it documented. The typical behavior for an A/W
pair is for the A function to be implemented in terms of the W one, using
the current system code page for converting the strings.
The C runtime getenv/_putenv functions actually maintain two separate copies
of the environment, one narrow, one wide.
The problem therefore is that it's not quite possible to provide a portable
On POSIX, programs have to use the char* functions, because they don't
encode/decode and therefore guarantee a perfect round-trip. Using wchar_t*
may fail if the contents of the environment do not correspond to the
encoding that the library uses.
On Windows, programs have to use the wchar_t* versions, for the same reason.
Using char* may give you a mangled result in the case the environment
contains a file name that cannot be represented in the current encoding.
(If the library uses the C runtime getenv/_putenv functions, those will
likely guarantee a perfect round-trip, but this will not solve the problem
with a preexisting wide environment that is not representable.)
Many people - me included - have adopted a programming model in which char
strings are assumed to be UTF-8 on Windows, and the char API calls the
wide Windows API internally, then converts between UTF-16 and UTF-8 as
appropriate. Since the OS X POSIX API is UTF-8 based and most Linux systems
are transitioning or have already transitioned to UTF-8 as default, using
UTF-8 and char results in reasonably portable programs.
This however doesn't appeal to people who prefer to use another encoding,
and makes the char API not correspond to the Windows char API (the A
functions) as those use the "ANSI code page" which can't be UTF-8.
Boost.Filesystem sidesteps the problem by letting you choose whatever
encoding you wish. I don't particularly like this approach.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk