Boost logo

Boost :

From: Carlo Wood (carlo_at_[hidden])
Date: 2004-08-22 06:01:21


On Sun, Aug 22, 2004 at 09:14:34AM +0100, Reece Dunn wrote:
> A complete path is one that is fully qualified, thus:
>
> fs::path absolute = fs::path( "d:/devel/libraries/boost/1.31.0/libs" );
>
> whereas a relative path is one that is "relative" to another. With a
> relative path, you must have an absolute path as the basis, e.g.:
>
> fs::path relative = fs::path( "../boost/mpl" );
>
> and can turn the relative path into an absolute one by doing:
>
> fs::path qualified = absolute / relative;
>
> so where is the problem?

The problem is not with relative paths, I think that boost::filesystem
currently supports relative paths rather well - provided that you
only use relative paths ;).

The problem is, that boost::filesystem does not help the developer
to write a portable program by warning him on every OS he might
develop on that 'absolute' is to be treated as a native path.
The pit one can fall into is therefore that you can use a value
(cause fs::path merely stores values - it lacks the notition of
relative and complete paths to be essential different) for 'absolute'
that stores a path that is complete on one OS but is not on another.

For example,

  // On GNU/Linux, a developer might do:
  fs::path absolute = fs::path("/devel/libraries/boost/1.31.0/libs");

  fs::path relative = fs::path( "../boost/mpl" );
  fs::path qualified = absolute / relative;

  if (fs::exists(qualified))
  {
    // We get here on linux. Without any warning.

Obviously, this code won't work on windows, and it won't work on
cygwin when compiled with BOOST_WINDOWS. Now, 'dont work' should
be that it throws, or that an assertion fails - but no, it simply
says that `qualified' doesn't exist.

I agree that I have a very hard time to convince anyone that this
is 'flawed' ;). It is so easy to say that the program example is
simply wrong ;).

Nevertheless, I think that uses two different types for the two
types of paths will greatly improve robustness of the code.
See below - where I'll give the same code using my proposed API.

> >The most problematic difference is that it is possible to even store
> >a third possibility (one that is neither complete nor relative).
>
> ??? Do you mean one that isn't relative because it is not based from an
> absoute path, or that the path does not exist?

No, I mean with a fs::path ph, ph.is_complete() returns falls and
ph.has_root_directory() returns true :). For example the
"/devel/libraries/boost/1.31.0/libs" used above is complete on
linux, but is not complete on windows on cywgin(!) (compiled with
BOOST_WINDOWS), and of course it also isn't relative.

> If so, you can do that in any file system, e.g.:
>
> fs::path foobar = fs::path( "../booster/foobar.foo" );
> fs::exists( absolute / foobar );
>
> A path will either (in absolute form) refer to a file, a directory or be
> invalid.

Here you assume that 'absolute' has the correct value thus. My point
is that this value has to be totally different on each supported OS.
That fact is not clear enough with the current implementation.
A possibility is to demand that 'absolute' must be of a type
fs::native_path. On top of that, we can allow to assign an 'absolute'
field to a (normal, relative) fs::path (or even call that fs::relative_path).
In that case you take care of the needed completion inside fs::exists.

Consider this code:

  fs::path foobar;
  
  if (something)
    foobar = fs::path( "../booster/foobar.foo" );
  else
    foobar = fs::path( "/devel/libraries/boost/1.31.0/booster/foobar.foo" );

which happens 'somewhere' in the code, outside of the view
of the code that developer is currently written. Then basically
the developer doesn't know if the path is relative or absolute
and the 'absolute / foobar' simply doesn't work. Yet, the current
boost::filesystem implementation allows it without problems.
I think that this is wrong: it should not be possible for one
variable to have OR a relative value OR an absolute value - the
two are essential different.

Ok, so - with the current implementation of boost::filesystem - one might
do:

  fs::path qualified = fs::complete(foobar, absolute);
  fs::exists(qualified);

because supposedly this would leave foobar alone when it is
already complete and prepend 'absolute' when it is not...
But no - this will fail. Because also "/devel/libraries/boost/1.31.0/booster/foobar.foo"
is (for example - but even one example should be enough to show that
there is a design problem, I hope), this will fail on cygwin. In that
case foobar isn't complete, but is DOES have root directory and
therefore absolute is rightfully ignored, but no root base is
prepended. The path "/devel/libraries/boost/1.31.0/booster/foobar.foo"
SHOULD have been expanded to "c:/cygwin/devel/libraries/boost/1.31.0/booster/foobar.foo"
(when boost was compiled with BOOST_WINDOWS and cygwin was installed in C:\cygwin\).

> The native you are referring to is an "absolute" path, correct?

With native I mean intrinsical "not portable". Currently that is only
the case for a path with weird characters, or for an explicit windows
path with a ':' in it for example. Imho, every "complete" path is native
in the regard that it is not portable by definition: every significantly
different OS uses a different way to specify its root part.

> Boost.Filesystem stores the paths in a generic form.

Hardly - after doing some check - it just stores the paths almost literally
as a string in memory. At most it converts a backslash into a slash.
Certainly the ':' of a root part in windows (like "C:\") is stil stored
as "C:/" internally. The information that this makes that object 'native'
(or 'not portable') is lost.

> Thus, if your
> filesystem uses ':' as a directory separator, the library will map this
> correctly from it's internal state. If you enforce that all "native"
> Windows paths must have a directory,

You are turning things around. I said I wanted to introduce a new state,
or even type, that stores the fact that a path is 'native'. And, I said
that every complete path should be marked with this new notion.
Now you say that the other way around: every native path must 'have a
directory' (== complete? (cause "/foo" has a directory, but is not complete
on cygwin, currently)).

> >> > I propose two design changes:
> >> >
> >> > 1) 'native' is now not only a representation, but an *internal state*
> >> > of fs::path. (this has no effect on the representation as returned
> >> > by fs::path::string()).
>
> This would make things too restrictive.

How can *adding* a flag to the class be too restrictive? So far, the above
doesn't say that anything else has to change. It could be that I want to
propose a 'bool is_native()' at most. In that case nothing would be restricted,
or changed when it comes to the current API part.

> I think you are confusing "native" with "absolute" paths, but how do you
> validate an absolute path?

Heh - no, I am not "confusing" the two! I am very well aware of what I am
saying when I propose to ADD a notion of "nativeness" that is ENFORCED for
"absolute" paths (complete paths that is). This notion is needed... see
all the examples above. I don't think I can give more examples without
it becoming pointless.

> Is there a
> Windows and a POSIX function to validate an absolute path (e.g.
> IsPathAbsoute)? It would then be possible to have a fs::is_absolute( ... )
> function.

boost::fs already defines that: fs::path::is_complete

> >> > 2) All 'complete' paths are automatically marked as 'native'.
>
> Likewise, how do you validate a complete path? What about a URL? If you
> have the URL "http://www.boost.org/people" and are using a system whereby
> the native directory separator is ':', the URL will be mapped to
> "http:::www.boost.org:people", corrupting the URL.

An url is not a FILESYSTEM path. The current implementation of boost::fs
also doesn't support urls in anyway (it talks about 'has_root_name',
'has_root_directory' etc.). The fact that you can store url's in a fs::path
with the current implementation shows how little it is aware of what it
is doing and therefore how little it protects you as developer again abusing
it. In order to support urls, you need yet another type.. fs::url_path.
Then, of course, it could be allowed to use an fs::url_path as 'root'
for a relative path. Now you will say: but then I can't write a generic
piece of code that accepts either files or http:// urls. So, ok - you
can make a fs::native_path accept fs::url_path's too:

  fs::native_path my_working_directory = ... // Some local directory.

  fs::url_path url("http://www.boost.org/people/"); // URL, not a relative path in a directory 'http:', which would be legal on UNIX

  fs::relative_path rel(my_working_directory); // Make 'rel' relative to my_working_directory.
  rel = "carlo/design.html";

  fs::native_path result;
  if (here)
    result = rel;
  else
    result = url / rel;

Imho, code like this is type-safe. It makes it clear what you are doing
and protects one against stupid abuse. Note how the root-handling of
different supported OS is *completely* hidden by 'my_working_directory'.
It is no longer the concern of the developer.

> >> If I understand you correctly, you are suggesting that error checking is
> >> turned off for native paths, I would support that, but other than that I
> >> don't see how it differs.
> >
> >Heh - the current design does NOT have an internal state that marks
> >a path as native or not. Hence, it cannot do sensible error checks
> >that need that knowledge. Isn't that difference enough?
>
> Native - as Boost.FileSystem uses it - is used to mark a path as using the
> native OS syntax for specifying paths. This is so you can do:
>
> fs::system_complete( fs::path( argv[1], fs::native ) );

I know that.

> and use the library on Windows, *nix, OpenVMS, MacOS, etc. without having
> to worry about the differences between how they specify their paths and
> thus what regular expressions need to be used on the string. The example

while importing/constructing the path. And immdeately after that it 'forgets'
that this path has a native value.

[..rip.. (this is getting to long already...
          has taken me a full hour to write this) ]

-- 
Carlo Wood <carlo_at_[hidden]>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk