Boost logo

Boost :

From: Bennett, Patrick (Patrick.Bennett_at_[hidden])
Date: 2004-08-24 11:42:15


> -----Original Message-----
> On Behalf Of Ferdinand Prantl
> Sent: Tuesday, August 24, 2004 4:30 AM
> To: boost_at_[hidden]
> Subject: Re: [boost] Re: Re: New design proposal for boost::filesystem
>
> I have nothing against usage of UTF-8 if it suits the scenario well. I
> just say that it is not an encoding for all purposes. It is a
multibyte
> one and so extremely inefficient for getting size, searching, etc.

[Bennett, Patrick] I fail to see how this is the case. Right now,
today, filesystem supports *only* ascii. If you continued to use ASCII,
nothing would change. There is zero speed penalty for calculating the #
of *bytes* in an utf-8 string. If you want to determine the # of
characters, then there is, but then only if you're actually working on a
Unicode string. There is no getting around this for an international
application, no matter what encoding is used.

> Why to prescribe it for all boost::filesystem users and force them to
put
> recoding into their sources,

[Bennett, Patrick] Absolutely no recoding would be necessary for current
users of boost::filesystem. boost::filesystem has no support for
unicode today, so why would they have to recode anything?

> when it can be achived inside the boost::filesystem as it is done
> in std::streams?
> I would like to have the boost filesystem as flexible as
> possible. Someone can work with filenames in std::string in current
> locale,
> in UTF-8 or a different locale, someone can use std::wstring with
UCS-2 or
> UTF-16, etc. The question is, if such a flexibility is not so rare,
that
> it rather spoils the interface. I don't think so.
>
> > Win32 doesn't support UTF-8 filenames natively. That's why
> boost::filesystem
> > would have to convert to/from UCS-2 along Win32 interface
boundaries.
> > If you're concerned about other platforms, you shouldn't be.
> boost::filesystem
> > currently works only with latin encodings in ascii strings so no
> functionality
> > would be taken away.
>
> UTF-8 is not identical with the complete iso-8859-1 (latin1) codepage.
> Some code could be broken by accepting UTF-8 in the new version.

[Bennett, Patrick] Hmmm, good point, but... would it break for any of
the characters that are valid characters for a path or filename on an
8859-1 system? No, not that I can think of.

> Linux can be configured to support UTF-8 natively. However, it is not
> necessary and depends on your locale installation and configuration.
>
> By imbuing I meant the conversion "application filenames encoding" ->
> "machine filenames encoding". Instead of putting a platform dependent
code
> into conditions, which does the translation, one could simply say "I
am
> running in UTF-8, boost::filesystem, please understand it and do the
> system
> translation for me". Exceptoins could sourt out incompatibilities.
>
> boost::filesystem::imbue("UTF-8"); // more abstract than codecvt
> pseudocode
> :-)
>
> In this example an internal conversion into UCS-2 would be done on
> Windows,
> on Linux it would depend on the configured locale and on the other
> systems,
> which could support ASCII only, it would convert into ASCII only.
However,
> it does not constrain the application from running wholly in wchar_t
(e.g.
> UCS-2) or char (UTF-8 or something else), or does not force the user
to
> write extra code for character conversion if it is not necessary.

[Bennett, Patrick] If you can think of a good way of handling this that
doesn't involve a mess of codepages, locales, and facets, then I'm all
for it. Frankly I think C++'s 'built-in' internationalization support
is a nightmare, but that's probably just me. My (intentional) limited
exposure to them probably hasn't helped. It's hard to beat having a
'single' encoding like UTF-8 that can handle all defined characters.
Unicode is definitely the way to go IMO.

My real issue with boost::filesystem is that as currently defined, it's
unusable in an application that will be used around the world. My
initial response to this whole thread was just to point out to David
that there *are* issues preventing people from using the library. He
didn't think there were any, so I was compelled to point out what one of
the issues was for me at least.

At the company where I work we're currently just pursuing our own
wrappers for what filesystem provides. I originally tried using
filesystem, but once I saw that it's handling of internationalization
was absent, I had no choice but to dump it. I certainly have an
interest in it being improved, and I could see looking at it again, but
someone will have to spearhead that initiative. Considering that this
hasn't really been brought up before tells me that people either aren't
using the library, or simply don't care about internationalization
(probably the latter). I, unfortunately, don't have that luxury.

Cheers...
Patrick Bennett


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk