
Hello Surprisingly enough, C++ file-based streams can be opened with a char * string (for the filename) only, while modern computer systems have Unicode filenames. I see boost::iostreams::basic_file also gets constructed from a char * only. My project name has Korean characters, and I work on a Latin1 Windows system, and on my system the narrow-characters set simply does not contain Korean characters. Is there a (good) way to open a file with a wstring in boost ? Thank you, Timothy Madden

I use : typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream; then you can use wchar, I have not tested with non asci filenames though. On Mon, Aug 2, 2010 at 11:46 AM, Timothy Madden <terminatorul@gmail.com> wrote:
Hello
Surprisingly enough, C++ file-based streams can be opened with a char * string (for the filename) only, while modern computer systems have Unicode filenames. I see boost::iostreams::basic_file also gets constructed from a char * only.
My project name has Korean characters, and I work on a Latin1 Windows system, and on my system the narrow-characters set simply does not contain Korean characters.
Is there a (good) way to open a file with a wstring in boost ?
Thank you, Timothy Madden
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Bo Jensen wrote:
I use :
typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream;
Yes, I can create wide streams, what I want is to pass a wide string as the file name to be opned. Thank you, Timothy Madden

On Mon, Aug 2, 2010 at 2:03 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
I use :
typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream;
Yes, I can create wide streams, what I want is to pass a wide string as the file name to be opned.
This should work : boost::filesystem::wfstream test; test.open(L"somepath"); The above typedefs was just to show the tools you need from booost filesystem.
Thank you, Timothy Madden
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Bo Jensen wrote:
On Mon, Aug 2, 2010 at 2:03 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
I use :
typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream; Yes, I can create wide streams, what I want is to pass a wide string as the file name to be opned.
This should work :
boost::filesystem::wfstream test;
test.open(L"somepath");
The above typedefs was just to show the tools you need from booost filesystem.
Oh, yes, you are right ! It must be the 'Additions to <fstream>' thing, that construct an ifstream from a filesystem::path/wpath. Actually I do not even have a wchar_t * in my program, I use a wpath ! :) I guess Bjarne was right, the boost documentation is a challenge for a newcomer. :) And I like to think of myself as a tough guy ... Thank you, Timothy Madden

On Mon, Aug 2, 2010 at 8:32 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
On Mon, Aug 2, 2010 at 2:03 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
I use :
typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream;
Yes, I can create wide streams, what I want is to pass a wide string as the file name to be opned.
This should work :
boost::filesystem::wfstream test;
test.open(L"somepath");
The above typedefs was just to show the tools you need from booost filesystem.
Oh, yes, you are right !
It must be the 'Additions to <fstream>' thing, that construct an ifstream from a filesystem::path/wpath. Actually I do not even have a wchar_t * in my program, I use a wpath ! :)
I don't know all the details, but on windows I think filenames is only utf-16. On linux you should be safe, what ever locale you use. I would be interested to hear how it worked out.
I guess Bjarne was right, the boost documentation is a challenge for a newcomer. :) And I like to think of myself as a tough guy ...
Thank you, Timothy Madden
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On Mon, Aug 2, 2010 at 9:39 PM, Bo Jensen <jensen.bo@gmail.com> wrote:
On Mon, Aug 2, 2010 at 8:32 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
On Mon, Aug 2, 2010 at 2:03 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
I use :
typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream;
Yes, I can create wide streams, what I want is to pass a wide string as the file name to be opned.
This should work :
boost::filesystem::wfstream test;
test.open(L"somepath");
The above typedefs was just to show the tools you need from booost filesystem.
Oh, yes, you are right !
It must be the 'Additions to <fstream>' thing, that construct an ifstream from a filesystem::path/wpath. Actually I do not even have a wchar_t * in my program, I use a wpath ! :)
I don't know all the details, but on windows I think filenames is only utf-16. On linux you should be safe, what ever locale you use. I would be interested to hear how it worked out.
We had a program compiling under Microsoft Visual Studio 2003 that was running in different regions around the world and allowing users in their regions to open their files ok. This program deals at the char* level, working with strings encoded using local code pages. When we upgraded to Microsoft Visual Studio 2008, this failed to work on std::ofstream/std::ifstream because Microsoft changed some internals of the runtime library. To fix this (I think we should've been doing this all along anyway!) we needed to issue a std::setlocale(LC_CTYPE,"") call at program startup so that the runtime library internally knew how to convert the char* to a wide character string. The runtime library uses mbstowcs() to convert that char* to a wchar_t* which needs to know the code page. If the program was Unicode we wouldn't have faced the issue above. Pete

On 8/2/2010 15:21, PB wrote:
We had a program compiling under Microsoft Visual Studio 2003 that was running in different regions around the world and allowing users in their regions to open their files ok. This program deals at the char* level, working with strings encoded using local code pages.
The only way this approach could possibly work is if users only used file names that could be encoded in their default code page, which is simply not true a lot of the time. For example, I run under a Japanese locale, but I regularly deal with files with Chinese names or German names that cannot be represented in CP932, the Japanese code page under Windows. -- Rainer Deyke - rainerd@eldwood.com

On 03/08/10 00:06, Rainer Deyke wrote:
On 8/2/2010 15:21, PB wrote:
We had a program compiling under Microsoft Visual Studio 2003 that was running in different regions around the world and allowing users in their regions to open their files ok. This program deals at the char* level, working with strings encoded using local code pages.
The only way this approach could possibly work is if users only used file names that could be encoded in their default code page, which is simply not true a lot of the time. For example, I run under a Japanese locale, but I regularly deal with files with Chinese names or German names that cannot be represented in CP932, the Japanese code page under Windows.
If only Windows supported a UTF-8 locale like most other systems...

If only Windows supported a UTF-8 locale like most other systems...
I think that's a C Standard Library implementation issue, so is a property of the compiler, not the OS. The multi-byte string _bmlen etc. support is designed around double-byte character sets, not UTF-8. Windows supports UTF8 as a "code page" for the fundamental wide/narrow conversion functions, so should take that fine for any narrow-string API function. Note that file names have their own setting separate from the main code page; that might be confusing matters. --John TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Bo Jensen wrote:
On Mon, Aug 2, 2010 at 8:32 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
On Mon, Aug 2, 2010 at 2:03 PM, Timothy Madden <terminatorul@gmail.com> wrote:
Bo Jensen wrote:
I use :
typedef boost::filesystem::wpath SlmWPath; typedef boost::filesystem::wfstream SlmWfstream; typedef boost::filesystem::wofstream SlmWOfstream; typedef boost::filesystem::wifstream SlmWIfstream; Yes, I can create wide streams, what I want is to pass a wide string as the file name to be opned. This should work :
boost::filesystem::wfstream test;
test.open(L"somepath");
The above typedefs was just to show the tools you need from booost filesystem. Oh, yes, you are right !
It must be the 'Additions to <fstream>' thing, that construct an ifstream from a filesystem::path/wpath. Actually I do not even have a wchar_t * in my program, I use a wpath ! :)
I don't know all the details, but on windows I think filenames is only utf-16. On linux you should be safe, what ever locale you use. I would be interested to hear how it worked out.
All Windows API functions have an ANSI version, including file system functions, despite NTFS having Unicode filenames. I do not know what happens when an ANSI function has to return some Korean/Japanize file name from the file system, on a computer with some latin locale, anyone cares to try ? Anyway I find that I have to explicitly #include <boost/filesystem/fstream.hpp> and then I get the new ifstream and ofstream classes in boost::filesystem namespace, that work just like the std::ifstream and std::ofstream, except that they can be constructed from a wpath also, and as such and they were able to create a Korean file name on the disk, were my current locale is Latin-2. I would like to point that this std::ifstream / std::ofstream issue is not an operating system problem, but a C++ library issue. Indeed Linux may allow UTF-8 as the current locale, and that makes the char * constructors enough to open any file name on any filesystem. However I find the UTF-8 locale to be only a system-specific work-around, as I may simply not want such a locale for my application (think about some limited devices/systems), and C++ implementations are not required to implement the UTF-8 locale, so I think the iostreams C++ standard library simply lacks wchar_t * constructors for file streams. Probably this stems from the misleading idea that the wchar_t * I would use to open a file can always be converted to a char *. The truth is that such a conversion limits the filename in the resulting char * string to characters from the narrow-character set only, so it leaves the problem of wchar_t filenames for ifstream/ofstream open. I have seen this misleading conversion used also as the reason for main() function only having a char *argv[] argument (and no wchar_t *argv[]), and for std::exception::what() returning only a narrow-character string, and no wchar_t * message. Again, this limits the actual content in the strings to characters from the narrow-character set only. Timothy Madden

All Windows API functions have an ANSI version, including file system functions, despite NTFS having Unicode filenames. I do not know what happens when an ANSI function has to return some Korean/Japanize file name from the file system, on a computer with some latin locale, anyone cares to try ?
When it converts to ANSI (that is, code-page encoding), you get non-round-trip substitutions and '?' characters, so you then can't open the file even though it claims to have found it in the directory. Or, you use the short-name alias, which is always 8-character ASCII. --John TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

On 02/08/10 12:46, Timothy Madden wrote:
Hello
Surprisingly enough, C++ file-based streams can be opened with a char * string (for the filename) only, while modern computer systems have Unicode filenames.
All of them but Microsoft Windows support UTF-8.
Is there a (good) way to open a file with a wstring in boost ?
Boost.Fileystem has wide characters support, but I would advise avoiding wide characters entirely.

Mathias Gaunard wrote:
On 02/08/10 12:46, Timothy Madden wrote:
Hello
Surprisingly enough, C++ file-based streams can be opened with a char * string (for the filename) only, while modern computer systems have Unicode filenames.
All of them but Microsoft Windows support UTF-8.
How would I let the file-stream object know that the filename to be opened is encoded in UTF-8 ?
Is there a (good) way to open a file with a wstring in boost ?
Boost.Fileystem has wide characters support, but I would advise avoiding wide characters entirely.
How ? If user enters an Unicode filename (with Korean characters) for me to open, and the current locale is Latin 2, how would I open the file ? Thank you, Timothy Madden

On 02/08/10 15:02, Timothy Madden wrote:
Mathias Gaunard wrote:
On 02/08/10 12:46, Timothy Madden wrote:
Hello
Surprisingly enough, C++ file-based streams can be opened with a char * string (for the filename) only, while modern computer systems have Unicode filenames.
All of them but Microsoft Windows support UTF-8.
How would I let the file-stream object know that the filename to be opened is encoded in UTF-8 ?
It is assumed to be in the locale of the system. Most POSIX systems use a UTF-8 locale these days, but if you really want to be portable, you should convert that.
How ?
If user enters an Unicode filename (with Korean characters) for me to open, and the current locale is Latin 2, how would I open the file ?
On Windows, convert from UTF-8 to wide characters when calling system calls. On other operating systems, pass UTF-8 to the system calls, or convert them to the locale if you care enough about non-utf8 locales.
participants (6)
-
Bo Jensen
-
John Dlugosz
-
Mathias Gaunard
-
PB
-
Rainer Deyke
-
Timothy Madden