Boost logo

Boost :

Subject: Re: [boost] [filesystem] Mac OS default codecvt facet
From: Beman Dawes (bdawes_at_[hidden])
Date: 2010-02-15 16:36:38


On Sun, Feb 14, 2010 at 8:22 PM, Peter Dimov <pdimov_at_[hidden]> wrote:
> Beman Dawes wrote:
>>
>> On Sun, Feb 14, 2010 at 3:53 PM, Peter Dimov <pdimov_at_[hidden]>
>> wrote:
>>>
>>> Beman Dawes wrote:
>>>
>>>> * Is UTF-8 OK with Mac OS users as the Boost.Filesystem default?
>>>
>>> UTF-8 is not merely a default on Mac OS X. It's _the_ encoding used
>>> by the OS.
>>
>> Do you have a link for that?
>
> The most authoritative one is probably
>
> http://developer.apple.com/mac/library/documentation/MacOSX/Conceptual/BPInternational/Articles/FileEncodings.html
>
> "All BSD system functions expect their string parameters to be in UTF-8
> encoding and nothing else. Code that calls BSD system routines should ensure
> that the contents of all const *char parameters are in canonical UTF-8
> encoding. In a canonical UTF-8 string, all decomposable characters are
> decomposed; for example, é (0x00E9) is represented as e (0x0065) + ´
> (0x0301). To put things into a canonical UTF-8 encoding, use the
> "file-system representation" interfaces defined in Cocoa and Carbon
> (including Core Foundation)."
>
> I think that in practice the OS will take any valid UTF-8 and normalize it
> internally, so it's not necessary to decompose it.
>
> http://lists.apple.com/archives/unix-porting/2007/Sep/msg00023.html
>
> "The kernel will reject any filename that is not a valid UTF-8 string, and
> it will even be normalized (to Unicode NFD) before stored on disk, at least
> when using HFS. The right way to deal with it would be to always convert the
> filename to UTF-8 before trying to open/create a file."
>
> http://lists.apple.com/archives/applescript-users/2002/Sep/msg00319.html
>
> "How a file name looks at the API level depends on the API. Current Carbon
> APIs handle file names as an array of UTF-16 characters; POSIX ones handle
> them as an array of UTF-8, which is why UTF-8 works well in Terminal. How
> it's stored on disk depends on the disk format; HFS+ uses UTF-16, but that's
> not important in most cases."
>
> http://developer.apple.com/mac/library/qa/qa2001/qa1173.html
>
> "In Mac OS X's VFS API file names are, by definition, canonically decomposed
> Unicode, encoded using UTF-8. This raises a number of interesting issues."

Those are great references. Many Thanks!

I've updated trunk accordingly, and closed #3928.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk