Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2005-03-31 04:22:32


On Wednesday 23 March 2005 18:07, Beman Dawes wrote:
> CVS now contains a branch "i18n" of the filesystem directories:
>
> * Class templates basic_path, basic_directory_iterator, etc, support
> narrow, wide, and user-defined path types. Typedefs path,
> directory_iterator, etc, are provided, so most existing code continues to
> work.

I recall we had a long discussion concerning basic_path vs. single path type.
I don't think results of that discussion are present in i18n.html --
essentially, there's no rationale for going with basic_path.

There were several distinct issues. First is that if you have single path that
stores unicode, then

  exists(path("foo"))

will perform char -> wchar_t conversion inside path constructor, and that
conversion might be not exactly the same that OS would have performed. One
issues is that program might not have initialized global locale with
locale(""). Another is that conversion performed by OS might be different
then those of locale(""). I must admit I don't know when it might be the case
on Windows (and POSIX don't do such conversions). So, I'd really like to know
about real use cases. After all, QFile + QString works on windows. See docs
at http://doc.trolltech.com/3.3/qfile.html

The second issue, only relevant if the above one is real, is mixing different
types of path. With single path:

   path p("a"), p2(L"b");
   p /= p2; // must do conversion, might not do what's desired

With basic_path:

   path p("a");
   wpath p2(L"b");
   p /= p2; // won't compile
   p /= path(p2); // explicit conversion is clearly seen.

This again relies on the assumption that conversion from char to wchar_t might
not do exactly the same as OS conversion would do.

The third issue is that I don't like templated implementation of all
functions. There's already compiled library, why not move all code there. For
example:

class common_path {
public:
    char* data;
    bool is_wide;
};

bool exists(const common_path& p)
{
   if (p.is_wide)
        SomeOSFunctionW((wchar_t)*)p.data);
   else
        SomeOSFunctionA(p.data);
}

Also I note that there's no conversion from basic_path<char> to
basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument
for conversion: say I have a library which exposes paths in the interface,
should I use path or wpath in it? If I use path, then due to missing
conversion, the library is unusable with other code that uses wpath. So I
need to use wpath. And so basically, all libraries need to use wpath
everywhere. So, why do you need path at all?

> * The POSIX wpath implementation assumes that UTF-8 is always the operating
> system's preferred external path encoding. If any Boost users are concerned
> about other encodings, please let me know.

I certainly do. The standard encoding for russian on Linux is koi8-r.
Probably, we need to use the conversion facet that's part of global locale.
Qt uses
 
   char *charset = nl_langinfo (CODESET);

and values of LC_ALL, LC_CTYPE and LC_LANG variables. But then it contains its
own translation tables. So using locale("") is the best guess, I think.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk