|
Boost : |
From: Beman Dawes (bdawes_at_[hidden])
Date: 2004-11-15 10:14:43
At 07:13 AM 11/15/2004, Peter Dimov wrote:
>Peter Dimov wrote:
>> Choosing the wrong native character type causes redundant roundtrip
>> conversions, one in Boost.Filesystem, one in the OS.
>
>Let me expand on that a little.
>
>It is _fundamentally wrong_ to assume that all present and future OS APIs
>have a single native character type.
The actual wording of PJP's paper was that for paths (not the entire OS
API's), one type could be considered "fundamental".
>Consider a case where a dual API OS has access to two logical volumes C:
>and D:, where the file system on C: stores the filenames as 16 bit
UTF-16,
>and the file system on D: uses narrow characters.
That happens all the time on Windows. Often the A: drive is a narrow
character FAT filesystem.
>Now the behavior of the calls is as follows:
>
>CreateFileA( "C:/foo.txt" ); // char -> wchar_t OS conversion
>CreateFileW( L"C:/foo.txt" ); // no OS conversion
>CreateFileA( "D:/foo.txt" ); // no OS conversion
>CreateFileW( L"D:/foo.txt" ); // wchar_t -> char OS conversion
Yes, that's my understanding too.
>Furthermore, consider a typical scenario where the application has its
own
>"native" character type, app_char_t. In a design that enforces a single
>"native" character type boost_fs_char_t ("native" is a deceptive term due
>to the above scenario), there are potentially redundant (and not
>necessarily preserving) conversions from app_char_t to boost_fs_char_t
>and then from boost_fs_char_t to the filesystem character type.
Yes. Note that even if a dual scheme is used, that same situation might
arise:
if ( fs::exists( "c:foo" ) ) ...
if ( fs::exists( L"d:foo" ) ) ...
Notice that a narrow character path was given for the wide-character
filesystem and a wide character path given for the narrow-character file
system. If the type of the user supplied path is what determines the API to
use, the O/S may still have to do conversions when there is a mismatch with
the file system.
Do you see any alternative? If the library queried the O/S about the path
(which I'm not sure is always possible) to see if the filesystem was wide
or narrow, a conversion would still have to be done if the user supplied
path used the other char type. That saves nothing and adds the cost of the
query.
>In my opinion, the Boost filesystem library should pass the application
>characters _exactly as-is_ to the underlying OS API, whenever possible.
It
>should not impose its own "native character" ideas upon the user nor upon
>the OS.
Your strongest argument IMO is the point about conversions not necessarily
being value preserving. (I guess we could tell Windows users that they
should not expect such conversions to work unless supported by the
applicable codepage. But that seems spin rather than a real solution.)
The efficiency argument is certainly real, but I don't see it as being
quite as strong. (It will be important for some users, however. Think of
very small or embedded systems.)
If the rule is that there is some type (char or wchar_t) associated with
each path, and the library will always use the native API of that type if
available, then it seems to me that the arguments in favor of a single path
class weaken considerably. Sure the library can keep track at runtime of
whether a particular path is wide or narrow, but it is much more normal in
C++ to distinguish at compile time. In other words, separate path and wpath
classes.
In discussion on the C++ committee's library reflector, there wasn't demand
for a templatized basic_path type. AFAICS, a templatized basic_path type
could be added later if demand arose.
--Beman
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk