Boost logo

Boost :

From: Ferdinand Prantl (ferdipr_at_[hidden])
Date: 2004-08-25 11:05:07


Hi,

I will be shorter, as I answered some pointes before.

> [mailto:boost-bounces_at_[hidden]] On Behalf Of Bennett, Patrick
>
> [Bennett, Patrick] Yes, you could pass UTF-8 through as-is in
> Linux. If your application was running in Japan on Windows,
> boost::filesystem would not work at all. So, yes, you can
> pass all sorts of extended characters to Windows in an 8-byte
> string (the majority of which aren't really valid for
> filenames), but that does nothing for Unicode support.
> It simply won't work. For Unicode support the application
> must call the xxxxxW versions of the windows API's and pass
> in a UCS-2 string.

I will run as long as you use ASCII and Japanese characters from an ANSI
codepage, for example from SJIS only. Yes the application will not be
capable of the full Unicode support as ANSI-C and STL is not. See my last
e-mail for some discussion.

> > What you can't do is use the same encoding on all platforms, because
> the
> > underlying platform API's won't understand them.
>
> [Bennett, Patrick] Correct, this was my whole point. If I
> can't write a portable (to at least Linux and Windows),
> internationalizable application with boost::filesystem then
> it's of no use to (at least) me.

:-) Workarounds are being build all the time in multiplatform applications.
You have to workaround ANSI-C for that. STL too. It is a question if to
workaround boost::filesystem as well, if Windows (only?) provide Unicode
methods.

> This is why I suggested that UTF-8 be used as
> boost::filesystem's encoding. It can be passed through as-is
> or converted as needed on each platform and it would be
> backwards compatible with existing boost::filesystem code.

It will not be compatible. It is not the way to force UTF-8, breaking the
current local encoding only support. Moreover, it would force all people to
recode to UTF-8 even if they would not need. That is why std::streams use
imbuing - no reader is forced to accept UTF-8 only. See my last e-mail.

Note: here I put together filenames with file content, with filenames it is
not so painful, but though...

> I'll probably not talk about this topic too much more. I
> think I've pretty much said everything I'm going to say, so
> if no one else agrees, I'll just move along. ;)

I do not agree with you suggestion because it breaks the compatibility with
the current interface and would force me to recode into UTF-8. I sketched a
solution for the interface, which is non-intrusive to the current interface
and the underlying implementation can use XxxW for Unicode input.

However, I still think it is a cannon for a sparrow...

What do the others think about this extension:

* provide methods for std::wstring and wchar_t (or generalize them for
basic_string<T>)
* implement the <wchar_t> methods to support Unicode by imbuing
* provide the most common imbuing objects in boost::filesystem (at least the
null one - see 1)

1) unicode <-> local: wcstombcs and mbcstowcs to support the current locale
functionality (imbue local)
2) unicode <-> unicode: UTF-8 conversion from boost::xxx to support UTF-8
encoding on UN*Xes and _wxxx or XxxW on Windows to support Unicode on
Windows (imbue unicode)

3) using externally implemented codecvt-like objects (using ICU, libiconv,
Win32...) can be achieved any conversion from the applications model into
filesystem codepage

Ferda

> Cheers...
> Patrick Bennett


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk