Boost logo

Boost :

Subject: Re: [boost] Making Boost.Filesystem work with GENERAL filenames with g++ in Windows (a solution)
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-10-28 06:02:03


On Thu, Oct 27, 2011 at 22:26, Beman Dawes <bdawes_at_[hidden]> wrote:

> On Thu, Oct 27, 2011 at 3:04 PM, Yakov Galka <ybungalobill_at_[hidden]>
> wrote:
> > On Wed, Oct 26, 2011 at 22:47, Beman Dawes <bdawes_at_[hidden]> wrote:
> >
> >> On Wed, Oct 26, 2011 at 6:24 AM, Yakov Galka <ybungalobill_at_[hidden]>
> >> wrote:
> >>
> >> > Personally I think that boost::filesystem::paths are a sad joke, it's
> a
> >> pity
> >> > they're heading to the standard. Although the OS-part is definitely
> good,
> >> > the way path class is design isn't suitable for paths outside the unix
> >> > world.
> >>
> >> Could you explain that a bit further? Since class path is used all the
> >> time for paths outside the Unix world, I'm curious to know what your
> >> concerns are.
> >>
> >
> > Give me some time to write a constructive criticism. Some issues I raise
> > below.
> >
> > > you still cannot use long paths
> >> > on windows (longer than MAX_PATH), although they are supported by the
> OS.
> >>
> >> There is one ticket outstanding,
> >> http://svn.boost.org/trac/boost/ticket/5448, that is somewhat related
> >> to PAX_PATH limitations. The objective is to support any path that is
> >> acceptable to the operating system, and that includes better support
> >> for long paths.
> >>
> >
> > I know this ticket. Your objective is useless. Seems Bjarne is right
> about
> > the composition of the committee. Have you done a survey among the users
> of
> > the library?
>
> I hear from users of the library regularly. They mirror the rest of
> the Boost population; very roughly one third use Unix-like systems and
> two thirds use Windows. Once Microsoft ships the Dinkumware
> implementation of the library (based on V2) with VC++ 211, the
> percentage of Windows users will presumably increase.
>

... and one tenth of them write portable code,1/200th aware of MAX_PATH
limitation, and 1/4000th really care about it. Btw, V2 was somewhat better
in unicode handling, since theoretically I could use the narrow strings for
UTF-8 (see [1] why not). Given the current specification of the library it
seems impossible that microsoft will workaround the MAX_PATH limitation.

> > What I do expect is that calling path.native() will return a
> > string ready to be passed to CreateFileW.
>
> Yes, that's what it does.

Excuse me, but it's not:

path p = "C:\\"; // assume this is a root of some project
path q;
for(int i = 0; i < 100; i++) q /= "a\\..";
q /= "text.txt"; // and this is a long relative path for a file within the
project

CreateFileW((p/q).c_str(), ...); // fails....

Does path.native() always return a string that can be passed to CreateFileW?
NO!

> Regardless of the operating system, the
> contents of path.native(), or more to the point, path.c_str(), is
> exactly as was passed into the path originally.

We don't need this requirement. Just drop it. Even if you want the path
class to store the original string, it doesn't rule out returning an
adjusted copy for native().

> That's important in
> case it is one of the implementation defined strings

I don't understand. Can you give an example please?

> - it isn't the
> job of path to adjust the string.
>

Wrong. If taking this as an axiom we get:

1) path is a "smart" string that converts "transparently" between narrow and
wide chars.
2) path is a "smart" string that adds '/' or '\\' on concatenation (when
needed).
3) path has a set of convenient observers (iterate through components,
extension(), stem() blah blah).

First, observe that 1 is orthogonal to 2 and 3. Furthermore, let's compare
them with what we have now (in world without boost::path):

1) Isn't fundamentally bad, but we can argue against this just as against
any string class that converts between different encodings transparently.
Anyway it's not the job of the path class. In any case standardizing on
UTF-8 eliminates the need of any conversions in the interface of boost::path
(see [1] for details), it's the job of native() to return the native string
from the given path. Currently I can use:

std::string myPath = get_utf_8_path();
CreateFileW(native(myPath), ...); // as simple as with boost::path

2) Nice. But this feature is not the one I would decide whether I want to
use the lib or not. Solved easily by following a conventions that all
directories end with '/' or '\\':

std::string myPath = get_user_home_dir() + "Documents\\";
CreateFileW(native(myPath + "a.txt"), ...); // simpler than in boost::path:
no need to worry about MAX_PATH 'overflow', native takes care about this.

3) Good, this is the no. 2 reason people use the library (no. 1 is the
"Operational Functions" which can be used without boost::path class in
user-code).

> > No worries about long paths,
> > slashes or backslash, relative paths or other per-platform quirks.
>
> Right.
>

What's right? boost::path doesn't workaround this at all! Each platform
quirk bubbles to the interface.

> Currently
> > I must write something like this:
> >
> > path p = get_some_path();
> > p = system_complete(p); // according to msdn not guaranteed to work for
> long
> > paths. Fortunately it does in practice.
> > std::wstring q = p.native();
> > if(starts_with(q, L"\\\\"))
> > q = "\\\\?\\UNC\\" + q.substr(2);
> > else
> > q = "\\\\?\\" + q;
> > // doesn't handle \\.\....
> > CreateFileW(q.c_str(), ...);
>
> If you have to write any of that, it is a bug in the library
> implementation. That's the point of
> http://svn.boost.org/trac/boost/ticket/5448 - to be sure that the
> odd-ball, implementation defined syntaxes work. Not just for Windows,
> but for POSIX too.
>

Please, I sincerely entreat you to write an example, how you envision this
code will look like given the following constraints:
===================

path p = get_some_path_1();
path q = get_some_path_2();
path r = p/q;

// MAGIC

#ifdef WINDOWS
CreatefileW(r.c_str(), ...);
#else
open(r.c_str(), ...);
#endif

1) get_some_path_1/2() might be arbitrary valid paths, possibly read from
different places in the configuration/user input, therefore although each of
them is valid (< MAX_PATH) their concatenation might exceed MAX_PATH.
2) The code creates a file at this path using CreateFileW on windows and
e.g. open() on Linux.
3) It there is a way to create the file at the specified path, it must do
it.
4) It's portable: no #ifdefs or whatever except for the CreateFile/open()
part.

===================

> The problem is more serious when we observe that the \\?\ and \\.\ syntax
> is
> > a detail of implementation. For example we DON'T want the user to be able
> to
> > input a \\.\ path. We also prefer the user to don't see \\?\ at all.
>
> Who is the "we" here? There was a time very early on in V1 where I
> though the Filesystem library path should only accept "approved"
> forms. Users, and a lot of them at that, let me know loud and clear
> that they didn't want to be nannied.

"Users" here were the end-users who use the software that builds on top of
boost::paths. And any user of the app who's not a programmer, even if she
has technical background, should not see \\?\ as output or need to prepend
\\?\ in the input to use a long path.

If they passed in a given native
> string, that's want they wanted to get passed to the operating system.
> If they passed in a given generic string, that's want they wanted to
> get passed to the operating system, modulo only any changes absolutely
> required by the O/S (which means none on either POSIX or Windows).
>

Again this is a false assumption. You just refuse to accept that sometimes
prepending \\?\ and converting '/' to '\\' is absolutely needed in some
cases. I expect that your answer to the above small 'exercise' will be a
proof for why I'm wrong.

> In fact
> > there are two path syntaxes on windows:
> > 1) User paths: can have slashes and backslashes. They start with "\\x",
> > "\x", "x:" or "x" (more or less).
> > 2) System paths: only backslashes supported in general, can't be
> relative,
> > may start with "\\?\" and "\\.\".
>
> Yes, and there is no intention for the Filesystem library to intervene
> if the user gets it wrong, such as by exceeding a maximum length
> mandated by the operating system or file system.
>

The user in this context is the end-user. And you're again shifting the
burden on the application writer.

>> > Moreover, judging by the last fixes to the library, it looks like
> Beman
> >> > wants to shift the burden of this on the user of the library, instead
> of
> >> > implementing something that works transparently.
> >>
> >> Which fixes are bothering you:-?
> >>
> >
> > I was talking about revision 71157.
>
> Ah! So you want to hide the various implementation defined formats?
>

Not exactly. I think that a design based on various path parsers and
formatters is more appropriate. native() will always return a system syntax.
Other functions will return the user-syntax. What syntax is used by the path
class internally is a detail of implementation. The path is constructed by
default from user-syntax. It can be constructed from system syntax by giving
the system parser to the constructor.

Such approach will make it possible to even work with windows paths on
linux, or extend the concepts to work with non-fs paths.

[1] -- http://permalink.gmane.org/gmane.comp.lib.boost.devel/225036

Sincerely,

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk