Boost logo

Boost :

Subject: Re: [boost] [Filesystem V3] Filesystem Version 3 beta 1 availablefor download and comment
From: Gregory Peele ARA/CFD (gpeele_at_[hidden])
Date: 2010-02-18 14:54:25


Some comments below from a random C++ developer who has written multiple cross-platform filesystem libraries for US military use. Take them for whatever you want. Hopefully they're useful. :-)

Beman Dawes wrote:
> On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert
> <Robert.Stewart_at_[hidden]> wrote:
> >
> > "stem" means nothing to me. I followed the Reference link to the
> > path class documentation and find no description of it there.
>
> Added a direct link to the reference doc description. Added an example
> to the reference doc description.

Regrettably the terminology for this is remarkably nonstandard, but I've never heard of "stem" before and would not have had any idea what it returned. Typically I've seen this called "base_filename" or even just "filename". Where did "stem" come from? Is there a precedent I'm not aware of?

> -----Original Message-----
> From: boost-bounces_at_[hidden] [mailto:boost-
> bounces_at_[hidden]] On Behalf Of Stewart, Robert
> Sent: Thursday, February 18, 2010 2:27 PM
> To: boost_at_[hidden]
> Subject: Re: [boost] [Filesystem V3] Filesystem Version 3 beta 1
> availablefor download and comment
>
> Peter Dimov wrote:
> > Beman Dawes wrote:
> > > On Thu, Feb 18, 2010 at 10:35 AM, Stewart, Robert
> > > <Robert.Stewart_at_[hidden]> wrote:
> >
> > >> You state that extension() returns the period to allow
> > >> distinguishing between an empty extension and no extension. That
> > >> seems wrong. Typical use cases for working with the extension will
> > >> require stripping the period before proceeding, so you push extra
> > >> work onto the client. Furthermore, I can't think of a case in
> which
> > >> extension processing code would work differently when there is no
> > >> extension and when the extension is empty. The extension is an
> > >> empty string in both cases. Since you already provide
> > >> has_extension() for distinguishing that there is one, extension()
> > >> should return an empty string when nothing follows the period.
> > >
> > > IIRC, that was Volodya's original design and I can't recall anyone
> > > ever complaining about it. True, we didn't have the has_extension()
> > > function, but still, I hate to break existing code. Does anyone
> else
> > > have a strong opinion?
> >
> > It is the right design to retain the period, IMO, and most
> > "get extension" functions do so, even on Windows, where there is no
> > difference between "foo" and "foo." when actually used to refer to
> > a file. See for example
> >
> > http://msdn.microsoft.com/en-us/library/e737s6tf%28VS.100%29.aspx
>
> That's an interesting precedent, but that strikes me as wrong, too!
>
> > On POSIX, it's even more important to retain the period,
> > because "foo" and "foo." refer to different files.
>
> I can see that creating "foo." from "foo" requires that one be able to
> set the extension as "." and that would require special case code.
> Perhaps the right solution is to prefix the argument with "." when
> omitted? That way, existing code, which provides the "." will continue
> to work, while code that has the extension, but no period, can work
> henceforth.

>From what I've seen, either approach works and doesn't tend to imply significantly more work on the user's part - most use cases I've had are to maintain maps of extensions to some sort of class for processing files of that type, or to recognize certain types of files in a directory, which work either way. So my implementations tended to return with the leading dot for extension() (for disambiguating the crazy POSIX case) and to accept with or without leading dot for change_extension. If no leading dot is provided, one is automatically prepended. Giving the empty string to change_extension removed the extension, but I also had a special method for that. This worked well in practice for me. I do agree that "has_extension" is useful to have for clarity and for the rare use cases where only the presence of the extension matters and its contents do not.

Note that if there were ever a system that by convention used something other than dot for an extension separator, requiring a leading dot could be a problem. I'm not aware of any such systems though.

One wrinkle that I never was able to decide how to handle was multiple extensions, like ".tar.gz". Some use cases would want ".tar.gz", some would just want ".gz", and a few would even want just ".tar". Does this library provide any direct support for managing chains of extensions like that?

Hope that helped.

Gregory Peele, Jr.
Applied Research Associates, Inc.
        


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk