Boost logo

Boost Users :

From: Mark Holloway (Mark.Holloway_at_[hidden])
Date: 2005-03-16 11:03:33


> I'd like to run a test here to be sure; could you please supply an example
> of <internationalized chars> that is causing the problem?

A filename containing a beta character "â" causes GetFileAttributesA to fail to find the file specified on my system. It's possible that if my codepage was different this would not be a problem.

 

> I'm not at all surprised you are having problems; it isn't at all clear to
> me that it is possible to do reliable processing on Windows using the A
> (narrow) variants if Unicode characters are present.
In fact I've discovered after experimentation with the alledged MBCS support on windows that it really is skin deep. I've not been able to get any char* function to take a UTF-8 string! Even those that claim to be affected by the current multi-byte codepage. Anyone who has managed this - please feel free to correct me! I've drilled down into the Rtl routines which seem to deal with multi-byte characters but there just doesn't seem to be away to affect the bits that matter.

 

> I've got the internationalized revision of the Filesystem library running
> on Windows; it seems to be handling wide characters with ease.

I've done the same thing myself (I guess more or less the same thing). I initially attempted it without changing the API by converting the wide result to UTF-8, mostly to assist in keeping common code base with our other platforms (Mac,Sparc,Linux, etc - some of which are wide-character challenged) before realizing the full extent of the MBCS issues outlined above. After all - if I can't construct a std::fstream with a UTF-8 argument...

 

> One fix for your problem is to switch to the internationalized version and
> use wchar_t based paths. That will use the W Windows variants, and will
> also work well on POSIX systems with internationalized file or directory
> names.
I'd be happy to try that version (and swap it for my own) - where can I find it? My own hacked version is switch-able, even for windows so that Win95 is supported without installing the Unicode layer (sigh).

 

> Another possibility is that we can work on the narrow functions to make
> them function better in the face of internationalized names. But I'd need a
> lot of help on that to develop test cases and strategies to handle them.
Without the underlying ANSI API variants not supporting UTF-8, I just can't see how this example could ever be portable in the presence of Unicode without wide chars:

std::ofstream file( boost::filesystem::current_path().native_file_string().c_str() );

 

It will finally call down to the the ANSI function CreateFileA - which just won't be able to deal with it.

 

Then there is the performance penalty of ANSI<->Unicode conversion under Windows NT.

 

Thanks for your time (and your work!)

Mark

 

P.S. One other minor suggestion whilst I'm in the filesystem area (and I really don't want to (re)open a can of worms here) is to be able to choose the default path check at compile time.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net