Boost logo

Boost :

Subject: Re: [boost] boost::filesystem::path frustration
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2013-01-25 16:55:01


On Fri, Jan 25, 2013 at 3:56 AM, Dave Abrahams <dave_at_[hidden]> wrote:

>
> I'm finding that boost::filesystem::path seems to be a strange mix of
> different beasts, unlike any entity we have in the STL.

I'm glad seeing that I'm not the only one thinking this way... this is one
of the reasons I don't use it (I got *very* frustrated each time I used
this library).

It is a mix of high level and low level concepts, i.e. of conceptual paths
(sequences of some path elements) and strings (sequences of characters).
One does not need a strong typedef for paths! If I wanted to work with
paths as with strings, I could just use strings (and this is what I usually
do). Nor I want a "system encoding" string that behaves differently on
different platforms (I use UTF-8 std::strings everywhere). Instead, a path
class shall provide a higher level abstraction.

First, what *is* a path? One can think of it as "a string naming some
resource in the file system", but this definition is low-level, treating
the path as a cookie, and thus not useful for characterizing path specific
semantics.

Definition (path): A path is a sequence of *instructions* (aka path
elements) for locating a resource. Following the instruction to locate the
resource is called "resolving the path".

Really, this is how the OS understands paths, therefore I do not even
consider this open to debate. Paths is so a generic concept that it can be
applied to much more than just FS paths (even for treasure hunting :)).

Definition (equivalence): Paths x and y are equivalent iff, for any system
state, resolving x and y yields the same resource, assuming the resolution
is successful.

operator== should test for equivalence. The equivalent(x,y) function is
currently misleading. It should instead be replaced with resolve(x) that
would return a key that can be used to test "equivalence" (so that one
could use it in associative containers, which is currently not possible).
Now one could do resolve(x) == resolve(y) instead, and much more.

 For example,
> when you construct it from a pair of iterators, they're expected to be
> iterators over characters, but when you iterate over the path itself,
> you are iterating over strings of some kind (**). Even though, once
> constructed, this thing acts sort of like a container, it supports none
> of the usual container mutators (e.g. push_back, pop_back, erase) or
> even queries (e.g. size()), making it incompatible with generic
> algorithms and adaptors.
>

Let me add some other issue to the list.

operator /
------------------
operator / is defined syntactical (low level). This is wrong, and it does
not work well.

Expected definition: x/y returns a path z s.t. for any initial system
state, evaluation of z yield the same resource as the evaluation of
elements of x followed by elements of y, assuming all evaluations succeed.
In particular, for any initial system state, (current_path(x),
current_path(y), current_path()) shall resolve to the same resource as
(current_path(z), current_path()).

Differences:
"c:" / "b" gives "c:\b" in boost but "c:b" according to the above
definition. Why is it wrong? Thinking of paths rather than strings, when
concatenating two path elements, why would there suddenly appear a third
one? Furthermore, considering the current description of parent_path()
which is formulated using the wrong operator /=,
path("c:a/b").parent_path() should return "c:\a" according to the
documentation. I guess (and hope) this is not what the implementation
actually does. In fact, I believe that parent_path shall not exist. Instead
there shall be pop_back() that does that in-place, and an iterator
constructor that constructs a path "as if" by successively applying
operator /= on each path element.

Similarly "a" / "/b" gives "a/b" in boost but "/b" with the above
definition.

operator+=
----------------
This is a new surprise for me, as it wasn't there last time I expected the
library, Is there any rationale for this path-string duality syndrome? And
bravo for the inconsistency: we have operator += without a corresponding
operator + (yes, I understand the the later is dangerous, but so is the
first one).

size()
---------
The lack of it is not an issue. Path shall be a bidirectional range with
value_type of "path_element". The computation of size would not be only
inefficient, but also useless. What's the logic of counting the number of
path elements? And in case you do want it in some esoteric case, you can
use distance(begin, end).

assign/append/constructor operations taking ranges
----------------------------
These should accept ranges of path_elements and work as-if by applying each
element with /=.

absolute(p, base)
--------------------
The case when base has no root_directory makes no sense. Boost.FS gives
absolute("a:x") == "a:\z\x" if current_path() == "c:\z", but the more
logical thing is "a:\w\x" if the current directory of drive "a:"
(environment variable "=A:") is "a:\w". Another example, using paths from
real world: let p = "Australia, current city, King St., No. 10", base =
"Tel-Aviv, Menachem Begin St., No. 7". boost::absolute(p,base) will bring
you to: "Australia, Tel-Aviv, King St. No 10", while what you want is to go
to Australia, ask yourself "what city I'm currently in?" and if you landed
in Melbourne you'll get to "Australia, Melbourne, King St., No. 10".

Expected definition: Returns system_complete(base/p).

In particular, this comes up because I'm trying to find the greatest
> common prefix of two paths. I thought this would be easy; I'd just use
> std::mismatch. But even once I've found the mismatch I don't see any
> obvious way to chop off the non-matching parts of one of the paths. [...]
>

Question: why you need to do this? (I do not mean that it is not needed,
just curious.)

Why should paths be so different from everything else? I think, if the
> design is actually right, some rationale is sorely needed.
>

They shouldn't. I believe the design of Boost.FS is somewhat wrong, mixing
different concepts together.

> Also,
>
> * (**) the docs don't say what the value_type of path::iterator is. A
> string value? A range that becomes invalid when the path is
> destroyed? Ah!?! How surprising; inspecting the code shows it
> iterates over paths! A container whose element type is itself is very
> unusual!
>

Agree. This bothers me too. IMO the value_type of paths and iterators shall
be the same, and likely be either a string or a path_element class, which
will provide appropriate interface. This includes observers:

* to query the type of the element. It can be one of sub_directory,
parent_directory or current_directory to avoid comparison with ".." and "."
(on OpenVMS it uses other strings, right?) as well as some implementation
specific values to query the values that were already known to the parser
(for example: drive (both \\?\c: or c:) or a UNC (both \\?\UNC\x or \\x) on
windows).

* to query the stem and extension.

Cheers,

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk