Subject: Re: [boost] Environment Variables Library?
From: BjÃ¸rn Roald (bjorn_at_[hidden])
Date: 2015-05-23 08:07:39
On 23. mai 2015 02:18, Michael Ainsworth wrote:
> On 22 May 2015, at 8:21 pm, Klaim - JoÃ«l Lamotte <mjklaim_at_[hidden]> wrote:
>> âBy the way, what would be the encoding of the strings returned by or
>> passed to the Environment library?
> Given that std::getenv returns a char*, I think the library should
> work with std::string, although we did discuss supporting
> std::wstring using templates. Whether std::string is encoded in ASCII
> or UTF8 would be an OS specific thing I imagine.
> Someone with more experience with character encodings might want to
> weigh in here.
[Michael, I took the liberty of rearranging you response a bit as you
are top posting, see http://www.boost.org/community/policy.html]
Disclaimer: I am no character encoding expert, so take care to verify
claims by me here.
I think encoding is going to be a challenge.
On Posix I think you are right that one can assume the character
encoding is defined by the system and that may be a multi or a single
byte character strings, whatever is defined in the locale. As the Posix
getenv, setenv functions are simply char* based with no statements on
encoding, it is possible to let the system determine the encoding.
UTF-8 will likely be used for UNICODE support, as other options make
On Windows however there are variants of the windows API for environment
BOOL WINAPI SetEnvironmentVariable(
_In_ LPCTSTR lpName,
_In_opt_ LPCTSTR lpValue
Unicode and ANSI names
SetEnvironmentVariableW (Unicode) and
The regular SetEnvironmentVariable use LPCTSTR, and according to
LPCTSTR is an LPCWSTR if UNICODE is defined, an LPCSTR otherwise.
typedef LPCWSTR LPCTSTR;
typedef LPCSTR LPCTSTR;
File paths in Windows are stored in double byte character strings
encoded as UCS-2 which is fixed width 2 byte predecessor of UTF-16.
Other string data may not be double byte character strings, and ASCII
and ANSI strings will certainly exist in C++ code. Nevertheless it seems
the conversions should happen when the API is setting or getting the
variables. I am not sure how these Unicode and ANSI name variants of
the API functions interact with the actual storage of the variables in
the environment block, but it make sense that code need to use them to
convert when needed from program code when a conversion is needed. A
standard C++ library need to facilitate for these conversions as well. I
am not sure how that is best done, but I can imagine the
Boost.Filesystem library have considered options for a very similar problem.
As the UNICODE macro determine if your Windows program have single or
double byte characters in its environment block with ANSI or UNICODE
UCS-2 value encoding respectively, a conversion may be needed when
creating child processes. The CreateProcess function seems to support
that, see the section on the lpEnvironment argument here
It is annoying that Microsoft ended up using UCS-2. Other operating
systems waited a bit longer to decide how to support UNICODE I think and
thus had a better option available with UTF-8. But the situation is
what it is and we have to deal with it.