Boost logo

Boost :

From: Ulrich Eckhardt (doomster_at_[hidden])
Date: 2008-05-26 16:06:52


On Tuesday 20 May 2008 15:13:02 Beman Dawes wrote:
[ issues using path and wpath in parallel]
> * As a further indication that there should be a single path type that
> can handle both wide and narrow types. Peter Dimov suggested that once,
> and I keep turning over in my mind how it could be accomplished. With
> C++0x supplying two new character types, the need becomes more pressing.
> I still don't want to invent a solution that applies only to filesystem
> paths; there is a general need for strings that can cope with multiple
> character types and encodings.

I'm currently working on an application where paths are represented as an evil
mixture of both std::string and std::wstring, including the ensuing confusion
which encoding a std::string has and how to represent a mixture between e.g.
Latin and Cyrillic in the same path when passing a char* to std::fstream.
Currently, I resolved to using a helper structure that exposes this
interface:

struct path {
  // for digestion by std::fstream
  // will throw in order to signal conversion failure
  char const* c_str() const;

  // convert to a Unicode-string
  std::wstring to_string() const;

  // add an element to the path
  path& operator/=(std::wstring const& s);

  // ... further methods ...
private:
  // ... data ...
};

The interface is built on Unicode strings (with the assumption that
std::wstring is suitable). The storage internally is a wchar_t string with
UTF-16 encoding on MS Windows (because that's its native encoding on all
systems relevant to me) and a UTF-8 char string on POSIX. However, it could
also be implemented as any other container, i.e. whatever best suits the
system. In all cases, when modifying the path the arguments are converted to
an OS-native representation and errors are thrown in case of unsupported
uses.

The main difference to current Boost.Filesystem is that a path is not treated
as a string. Rather, it is considered something depending on the OS and which
is related to a string or uses strings in its interface. In other words, it
is an implementation-specific subset of all strings. Further, it is
convertible to a string (sometimes).

Note that I'm currently still experimenting with this but I can keep you
updated if the above interface and implementation will be useful beyond this
particular application.

Uli


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk