Boost logo

Boost :

Subject: Re: [boost] wfopen
From: Ulrich Eckhardt (doomster_at_[hidden])
Date: 2010-12-13 16:16:49


On Monday 13 December 2010 06:11:53 Chad Nelson wrote:
> On Sun, 12 Dec 2010 20:58:06 -0500
>
> Christian Henning <chhenning_at_[hidden]> wrote:
> > Hi there, one of the requirements for me to fix before I include
> > io_new to boost is to support opening files with unicode filenames. As
> > far as I can see there is _wfopen on Windows platforms. What's the one
> > for Linux and co? [...]

You can use mbstowcs/wcstombs and hope that the environment is setup
correctly. I'd suggest looking at Boost.Filesystem though, even if only for
inspiration.

> So far as I know, there isn't one. Linux encodes all Unicode filenames
> in UTF-8, then uses the standard fopen function.

This is dangerous half-knowledge. Typical OSs based on Linux are configured to
use UTF-8 as encoding for their filesystem. However, if you throw in an old CD
burnt with a different encoding, that part of the filesystem will have a
different encoding. Also, you can happily create files with any encoding
there, you don't have to use UTF-8. The only properties of the path that you
can rely on for Linux (and probably other POSIX platforms) is that they are
zero-terminated (null byte at the end) and the individual segments are
seperated by slashes. In between, you can have anything -- control characters,
newlines, spaces, or text in any any encoding.

Apple OS X mandates UTF-8, but IIRC it fails when presented with a medium
who's encoding it doesn't recognize.

MS Windows requires people to use UTF-16. However, it doesn't enforce that
either, you can still use broken surrogate sequences to some extent. Media
mounted with an unknown encoding are mapped by the OS to something that can be
handled somehow, I'm not 100% sure about the mechanics (or when it fails).

Good luck!

Uli


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk