Boost logo

Boost :

From: Ferdinand Prantl (ferdipr_at_[hidden])
Date: 2004-08-24 04:29:53


Hi Patrick,

> Patrick Bennett wrote:
> > Ferdinand Prantl wrote:

> > What do you think about imbuing and codecvt-like approach in
> > boost::filesystem for the names of the files?

> I don't (personally) care for it. More and more libraries and standards
are using UTF-8 (for good
> reason) these days. It's a nice, simple, and flexible encoding.

I have nothing against usage of UTF-8 if it suits the scenario well. I just
say that it is not an encoding for all purposes. It is a multibyte one and
so extremely inefficient for getting size, searching, etc. Why to prescribe
it for all boost::filesystem users and force them to put recoding into their
sources, when it can be achived inside the boost::filesystem as it is done
in std::streams? I would like to have the boost filesystem as flexible as
possible. Someone can work with filenames in std::string in current locale,
in UTF-8 or a different locale, someone can use std::wstring with UCS-2 or
UTF-16, etc. The question is, if such a flexibility is not so rare, that it
rather spoils the interface. I don't think so.

> Win32 doesn't support UTF-8 filenames natively. That's why
boost::filesystem
> would have to convert to/from UCS-2 along Win32 interface boundaries.
> If you're concerned about other platforms, you shouldn't be.
boost::filesystem
> currently works only with latin encodings in ascii strings so no
functionality
> would be taken away.

UTF-8 is not identical with the complete iso-8859-1 (latin1) codepage. Some
code could be broken by accepting UTF-8 in the new version.

> The UTF-8 representation of ascii strings is identical, so if you already
use
> ascii strings, nothing will change, and nothing will break. If you want
your
> application to be runnable in multiple countries though, an operating
system
> which boost::filesystem has translations defined for would be required.
Linux
> is UTF-8 natively (assuming the right environment variable is set), so
> boost::filesystem would just pass everything through as-is. The Win32 poet
> would have to make some simple conversions (Windows even has built-in
functions
> to perform this conversion) . Other platforms might have to have make
other
> conversions to/from UTF-8, but assuming that platform supports Unicode at
all,
> this is a no-brainer.

Linux can be configured to support UTF-8 natively. However, it is not
necessary and depends on your locale installation and configuration.

By imbuing I meant the conversion "application filenames encoding" ->
"machine filenames encoding". Instead of putting a platform dependent code
into conditions, which does the translation, one could simply say "I am
running in UTF-8, boost::filesystem, please understand it and do the system
translation for me". Exceptoins could sourt out incompatibilities.

boost::filesystem::imbue("UTF-8"); // more abstract than codecvt pseudocode
:-)

In this example an internal conversion into UCS-2 would be done on Windows,
on Linux it would depend on the configured locale and on the other systems,
which could support ASCII only, it would convert into ASCII only. However,
it does not constrain the application from running wholly in wchar_t (e.g.
UCS-2) or char (UTF-8 or something else), or does not force the user to
write extra code for character conversion if it is not necessary.

Ferda

> Patrick Bennett


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk