Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-11-12 14:26:20


Beman Dawes wrote:
> At 08:36 PM 11/11/2004, Peter Dimov wrote:
>> Beman Dawes wrote:
>>> At 12:41 PM 11/11/2004, Peter Dimov wrote:
>>>> In particular, the user cannot define the conversions
>>>> between the different path types, because they are implementation
>>>> defined.
>>>
>>> The default conversion is implementation defined, but users can
>>> supply their own conversion. One use case I have in mind is a
>>> character based O/S which uses some MBCS encoding of paths that
>>> isn't UTF-8, but the user wishes to burn a CD with UTF-8 encoding.
>>> The user should
> be
>>> able to
>>> provide such a conversion function, overriding the implementation
>>> defined default.
>>
>> I see no need for custom conversions in this case. The user can just
> supply
>> the appropriate narrow UTF-8 path directly.
>
> I considered that. For most users, who will rarely if ever need a
> custom conversion, it would be fine to require them to do any custom
> conversion themselves before constructing a path. But a few users
> will need to do custom conversions for virtually every use of the library
> (probably
> because their O/S just traffics in raw chars, yet they need a wide
> character encoding.) These users would be helped a great deal by
> custom conversions.

I still don't get it. I guess that we need code. Either way, it is the user
doing the conversion. They aren't helped one bit.

We have two OSes in use today. Windows, which takes either path or wpath,
and POSIX et al, which takes only a path. If the user wants to use something
that is neither a path or a wpath, he must convert it to one of those. There
is nothing the library can do, and providing smoke and mirrors just to make
it _seem_ that other paths are supported, when in reality they simply are
not, is both a disservice and a needless complication. IMO.

> One case where a custom conversion is required is for a user defined
> string type. There isn't any default; the user has to supply the
> conversion.

The user needs to convert the user-defined string type to either path or
wpath. This is not something that the filesystem library can, or should, do
for him; a function can handle this conversion easily. This use case does
not imply that there must exist a basic_path for every basic_string, because
such a basic_path is not a filesystem path. The filesystem simply does not
take user-defined strings, never will, and no amount of traitification can
change that.

> I know you don't believe in the usefulness of such user defined string
> types, but I'd be surprised if the committee would accept elimination
> of user defined string types.

There is no such elimination. basic_string works exactly as before, and OSes
work exactly as before.

> Also, remember that basic_string<char16_t> and basic_string<char32_t>
> may well be mandated in the fairly close future.

The filesystem library provides an interface to the native OS filesystem
API. If that API can take char32_t (which is not the case today on any
platform AFAIK), then the library needs to be able to take char32_t. On such
a platform wchar_t will probably be char32_t, so a wpath can be used as-is.
This is similar to the current status quo on Windows, where the UTF-16 wpath
encoding dictates that wchar_t is char16_t.

Custom conversions don't help, because only the system knows how a char32_t
name maps to a char name.

Anyway, here's a summary of my position (assuming two path types):

    void fs_function( path const & p );
    void fs_function( wpath const & p );

Windows ("dual") implementation:

    First overload calls FsFunctionA, second FsFunctionW.
    No conversion is done by the library, because the assumption
    is that only the system can do it right.

POSIX ("single") implementation:

    First overload calls ::fs_function, second does a library-supplied
conversion and invokes the first.

User needs to use basic_string<UDT>:

    wpath path_from_UDT( basic_string<UDT> const & s );

This covers the filesystem part. I suspect that what you want is to provide
the generic path grammar part, templated on arbitrary character types. (A
native grammar probably won't work for characters that aren't native.) That
may be nice, and in fact I remember suggesting that before :-) but I'm
really not sure whether this outweighs the fact that the filesystem-specific
part of the design is encumbered with supporting the kitchen sink, because I
don't recall ever needing path manipulation for something that is not char
or wchar_t.

One way to provide the necessary functionality is to expose a collection of
algorithms that allow the user to do path manipulations on arbitrary
character ranges. ;-)

Either way, we need examples before we can move the discussion forward.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk