Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2003-08-10 18:39:03


Beman Dawes <bdawes_at_[hidden]> writes:

> At 08:06 PM 8/9/2003, David Abrahams wrote:
>
> >As a user of the filesystem library, I am having the experience that
> >obvious things are hard to find, and the docs are much harder to
> >understand than they ought to be. The use of creative naming really
> >gets in the way. For example, the term "complete" is never defined
> >anywhere.
>
> It is defined by the is_compler() returns clause.

I really think you have to do better than burying it in the returns
clause of a single function's docs. The term is used all over the
library docs.

> > The closest we come is in the following naming rationale.
> >
> > is_complete
> >
> > bool is_complete() const;
> >
> > Returns: For single-root operating systems,
> > has_root_directory(). For multi-root operating systems,
> > has_root_directory() && has_root_name().
> >
> > Naming rationale: The alternate name, is_absolute(), causes
> > confusion and controversy because on multi-root operating systems
> > some people believe root_name() should participate in
> > is_absolute(), and some don't.
> >
> >I'm sorry if this sounds harsh, but I think the cure for someone being
> >confused about the term "absolute" on multi-root OSes is to pick the
> >definition that allows the term to be meaningful (an absolute path
> >identifies a specific location, and so must include the root) and *add
> >a clarifying note or definition for the corner case*, not to pick some
> >new term which nobody knows about and makes the library hard to
> >approach.
>
> The library isn't all that large that people can't just read about
> each function.

That is a remarkably unsympathetic view towards busy programmers who
often just want to get a simple job done without reading about a
whole library in detail.

Anyway my experience contradicts that. I did the reading, but the
details didn't sink in, and found myself going back the next day in
frustration to review and try to get a picture of the organization. I
think this is in part because of the naming, and in part because of
the organization of the docs. There a tutorial and examples on the
front page, and some definitions, all of which are really great to
have. There is crucial information missing, though. You need an
explanation of some organizing principles such as your
leaf/branch/root naming schema, a description of "complete" and how it
relates to peoples' understanding of "absolute" (unless I can convince
you to use more traditional names).

Oh, and initial_directory points nowhere in operations.html: it's
"initial_path".

> There were lengthy discussions on the list of this and other naming
> issues during development, during review, and during the resolution
> of review issues. Many people had fairly strong views.

I could accept the idea that some of the naming choices were
neccessary, but when you add the choice of "basename" into the mix,
which flatly contradicts existing practice, it gives the impression of
being arbitrarily inventive.

> IIRC, the idea that is_absolute( "/foo" ) was false on some
> operating systems was impeded by long-held beliefs.

Err, let's see... strings can be implicitly converted to paths, and
the implicit conversion treats the string as a "portable generic path
format", right? So when is "/foo" not absolute/complete? Is not foo
the name of the root? Oh, after much crawling backwards through the
docs, I see what has been done here... I suppose it's needless to
say, but I would've chosen a different approach. The idea that
is_complete(path(some_string)) returns different values on different
systems undermines the notion that paths are portable and generic.

If I were king, the portable, generic version of windows-native
"c:/foo" would be "/c/foo" and the portable generic version of
windows-native "/foo" would be *current_path().begin()/"foo". Is
there a reason that approach was rejected?

> By giving the function an unfamiliar name, people are forced to
> actually read the specs instead of just assuming what it does, and
> that ends up being a good thing, IMO.

Only if you put the definition of the term in an appropriate place.
If I'm reading
file:///C:/boost/libs/filesystem/doc/operations.htm#complete I don't
want to have to scratch my head about the meaning.

> I suppose if we were to discuss the names all over again we would come
> up with a different set of names. But unless the new names are
> markedly superior to the old names, it would just be churn to change
> them, and might be a real step backwards.

I don't want to come up with new names. The only names that I think
are superior to the ones already used are familiar old names that
everyone knows.

> >--- aside ---
> >Regarding complete paths, is there any guarantee that they are
> >canonical? Is foo/bar/../baz reduced to foo/baz?
>
> Yes. That is documented as a postcondition specifying canonical form
> for all the functions that modify a path. I've just double checked,
> and it doesn't look like any were missed, but let me know if you
> spot any way to alter path state that doesn't supply that
> postcondition.

complete and system_complete come to mind. It would probably be
better to prominently document that as an invariant of path rather
than try to sprinkle it throughout all the functions.

> > See
> >http://java.sun.com/j2se/1.3/docs/api/java/io/File.html#getCanonicalPath()
> >for an example of the possible semantics. We could learn a lot
> >about what's useful and broadly implementable by studying the
> >libraries of Java and/or Python (yes, I realize that the
> >portability of Java ain't quite what it's cracked up to be).
>
> Yes, I often found other libraries helpful, although many of them
> offer syntactic portability rather than semantic portability.

What do you mean by semantic portability? Isn't it undermined by the
variability of path("/foo").is_complete()?

> The legacy operating system API's interesting because they sometimes
> take different approaches. Sometimes what we think of as a path is
> just a key used to find the actual path via some external mapping
> mechanism.

I don't think I understand what you're saying, sorry. Could you be
more specific?

> >The difference between is_empty(ph) and ph.empty() is too slight,
> >IMO, for their differing semantics. IMO it's not useful to have
> >one function which reports both empty files and empty directories
> >- the implications of the two are much too different.
>
> Early versions of the library did provide the finer granularity of
> is_empty_file(ph) and is_empty_directory(ph), but they didn't work
> out in practice, and we changed to a simpler set of non-compound
> functions. Remember that the library was in private use for quite a
> while before the public review, and we got to see what worked and
> what didn't. Compound conditional functions definitely fell in the
> "didn't" category.

It's easy for people that use a library in private for a long while as
it evolves to become comfortable with its conventions and
philosophy. That doesn't mean it will be approachable to people who
haven't seen it before.

Regardless, I still think the empty/is_empty thing is very confusable.

void f(path p)
{
   if (p.is_empty()) // whoops, syntax error -- how do I fix it?
   {
      ... // could be p.empty() or is_empty(p)
                     // but the two predicates mean completely
   } // different things
}

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk