Boost logo

Boost :

Subject: Re: [boost] [nowide] Request for interest (nowide unicode support for windows)
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-06-16 15:50:19


> > Me too. > > I'm saying that Filesystem v3 on Windows doesn't interpret > narrow strings > as UTF-8 by default.  Berman said that it did but I > beg to differ.  Here's > what the comments say: > > //  For Windows, wchar_t strings do not undergo > conversion. char strings > //  are converted using the "ANSI" or "OEM" code > pages, as determined by > //  the AreFileApisANSI() function, or, if a > conversion argument is given, > //  using a conversion object modeled on > std::wstring_convert. > > In other words "שלום.txt" would be interpreted as being > in whatever > encoding the local code page is set to and would, > therefore, produce a path > containing gibberish for most people.  This is > standard Windows behaviour > :P This standard Windows behavior is exactly **the** problem. To be honest, have you seen anybody using "wide-path" outside of Windows scope? Do you actually need such "wide-path" for POSIX platforms? The answer is not. Actually, POSIX OS does not care about filename charset, as I can create a file std::ofstream f("\xf9\xec\xe5\xed.txt"); Which is valid file (שלום in ISO-8859-8) but invalid UTF-8. But it is valid file-name (and the locale is UTF-8 locale). > > Your problem is yet another step further than this.  > Assuming fs3 correctly > converted "שלום.txt" to the UTF-16 equivalent, how do > you then open a file > using this wide-char name?  Well, MSVC has wchar_t > overloads so this works > fine.  You're right about glibc++/MinGW though.  > fs::fstream will fail > there.  Rather than introducing a nowide library, why > don't we just try to > fix this in Boost.Filesystem? > I think that this can be fixed (the way I fixed it in nowide implementing fstreambuf over stdio+_wfopen) http://art-blog.no-ip.info/files/nowide.zip But this is one particular problem. There are more. What about filesystem::remove and others? From what I see in the code, it supports only path and not wpath --------------------- But this is a part of one bigger problem. When I develop cross platform applications I have following options for operating of files. For example when I want to remove, rename, create a file in a program writing cross platform applications, writing using standard platform independent C++, Writing for POSIX operating systems and for MS Windows. OS \ Str | std::string | std::wstring | ----------------------------------------------- Std C++ | Ok | Not Defined! POSIX | Ok | Not Defined! WinAPI | Not UTF-8 | Ok What I can see. I need either use wide strings that works only on Windows but require me to convert to other encoding for operations on files. Or I may use normal strings as standard requires and have problems with Windows as it is not fully supported. Or I need to write two kinds of code: - One for Windows using "Wide" strings - One for anything else using normal strings. Because windows does not support UTF-8 code-page. So far? Why? Why do you need all this if you can just create a tiny layer that makes Window support UTF-8 code page by converting std::string to std::wstring and calling appropriate API? My Opinion: ----------- - There is Neither use nor Need of "Wide" strings for file system operations on all platforms but Windows. - Introducing boost::filesystem::wpath does not help as it meaningless on other OSes. - Using Wide strings is extremely error prone in cross platform applications as on Windows they are UTF-16 and on POSIX they are UTF-32 encodings. Wide Path support just make our applications more complicated and error prone. So... Just create an API that is friendly to UTF-8 strings and forget about this hell. ------------- But from what I see this will never happen in Boost as it is too Windows centric, and Windows is too ignorant to basic programmers needs who want to write a portable programs. Regards. Artyom P.S.: The title of this mail is request for interest. It is ok not to have one.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk