|
Boost : |
From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2019-09-17 16:14:02
On Tue, Sep 17, 2019 at 8:17 AM Peter Dimov via Boost <boost_at_[hidden]>
wrote:
> Rainer Deyke wrote:
> > Or the user could be running a non-UTF-8 locale, but accessing a
> > filesystem created by somebody who was using UTF-8 - in which case any
> > filenames should be in UTF-8, even if the user's locale disagrees.
> >
> > It is because of this last possibility that I recommend treating all
> > command-line arguments as UTF-8 on Unix systems, even if running a
> > non-UTF-8 locale, for all cases where treating them as binary blobs is
> > impractical. Unix filenames are binary blobs, but the de-facto standard
> > for interpreting these binary blobs as text is to use UTF-8. [...]
>
> How does any of this affect the library? It just gives you whatever you
> passed as `argv`, without needing to interpret it.
>
> Windows is a different story.
>
Indeed, you can just use UTF-8 (as long as you document this!) for
everything except Windows. With Windows, you need to provide a
wchar_t/UTF-16 overload for every char/UTF-8 overload in your lib.
If you want 100% correctness, you are not allowed to arbitrarily convert
the wchar_t strings. In particular, you are not allowed to convert them to
UTF-8, because it is possible that one of them is a filename, and it is
possible to construct filenames on the Windows platform that are not
properly UTF-16-encoded. This means that the UTF-16 -> UTF-8 conversion is
lossy, if you follow the Unicode guidelines for that conversion -- you
should produce a replacement character (U+FFFD) where you encounter the
broken UTF-16.
Though such broken-UTF-16-named files are possible to create, they do not
come up often in practice; they almost never do. So, if you don't care
about this case that prevents 100% correctness, just provide wchar_t
overloads, and implement each one by converting to UTF-8 and calling your
UTF-8 overload, and only define the wchar_t overloads when building on
Windows.
Zach
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk