Boost logo

Boost :

From: Sean Parent (sparent_at_[hidden])
Date: 2003-01-06 19:13:01


Beman,
Thanks for the clarifications - I thought I had followed the whole thread
but I had misunderstood and thought the thrust was to create a new path
representation -

on 1/6/03 1:51 PM, Beman Dawes at bdawes_at_[hidden] wrote:

> The syntax, semantics, and about everything else about paths that operating
> system functions traffic in was defined years ago for each operating
> system, standardized or not. Those native path formats aren't something we
> can change.

At this level - the solution becomes very platform specific. Some platforms
have adopted a UTF or UTC mapping, some always assume "current local"
without conversion. Any simple mappings that I've seen mentioned here will
fail to give an expected name, even if they do roundtrip correctly.

My recommendation would be to provide facilities to map to an appropriate
space (perhaps providing some common ones such as UTF-8, UTF-16, UCS-2
(although I'm not sure what you do for characters outside the UCS-2... etc.
The only safe thing I can imagine if for characters that can't be
represented in the target space is to map them to UTF-7 (not UTF-8, UTF-8 on
many double byte system will give you no end of headaches). UTF-7 is a good,
lowest common denominator, form.

> For boost::filesystem::path, any other path handling facility, the need
> arises to convert a path between narrow and wide character strings. For
> example, the operating system may use narrow character paths but the
> program traffics in wstrings.

A program traffics in strings in some encoding - be it wstrings or strings
of ShifJIS characters doesn't really matter. Wide vs. narrow isn't an issue
(UTF-8 poses as many problems as UTF-16 or UTF-32 do, even my e-mail reader
gives me three options for Western European encodings). So long as you have
a path from the encoding to UTF and then to the platform file system
encoding you're doing about as well as can be expected. The toughest part is
making sure you can recognize any escaped characters and that they are
unlikely to appear accidentally. You may also need to escape characters that
are in the character set but not allowed in a file name on a particular
platform (null characters, line breaks, etc.).

The other part of interest is meta-information that gets encoded into a path
a name. Path separators are just one example, but notions like "if the first
character is a '.' then it is invisible" on one platform but "if the first
character is a '.' then it is a driver" on another. File extensions denoting
a file type are another example. I haven't looked at how the filesystem
library deals with this level of meta information.

> That causes a need for conversions, and if I
> understand correctly, there are a number of ways (all conforming to one
> standard or another) to do that conversion, and it is really messy because
> of locale issues. PJP is well aware of those standards; indeed he wrote
> some of them, and IIRC has been to Japan and other Asian countries more
> than twenty times dealing with internationalization issues.

Whatever the solution - it is going to have to be somewhat
platform/filesystem specific. I'd recommend a good system for providing
platform solutions with a reasonable fallback mechanism.

-- 
Sean Parent
Sr. Computer Scientist II
Advanced Technology Group
Adobe Systems Incorporated
sparent_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk