Boost logo

Boost :

From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2004-02-01 17:51:25


Beman Dawes <bdawes_at_[hidden]> writes:

> [snip: path ordering]

> The primary use case I know of for operator<() is default ordering for
> paths used as keys in associative containers. I can't see that either
> approach is superior for this use, so unless someone comes up with a
> compelling argument, (1) will be used.

I would suggest lexicographical ordering of the components, i.e. option
(2). Ordering based on the ``portable'' path representation would, I
think, be confusing on platforms which do not have a native path format
which is identical or very similar to the portable format.
Furthermore, the primary if not exclusive purpose of the portable path
format is to allow storing (relative) paths as string constants, which
is functionality that many users may not need, and thus will not be
using the portable path representation.

> Equivalence
> -----------

> Two paths will be considered equivalent if they resolve to the same physical
> directory or file.

> Question 1: What is a use case that requires this function? Verifying that
> source and target files are different before some modifying operation is the
> only one I've come up with. I guess that is sufficient to justify adding the
> function.

Following directory trees is the common use case. Of course, without
a reliable file identifier number, actually using this function would
be highly inefficient.

A link_count function would also be useful for supporting certain logic
for dealing with making backup files, such as move if the file is
linked only once, otherwise copy.

In addition, useful functionality that could be implemented at a later
point would be a unique file identifier object, which keeps an open
file handle/descriptor, to ensure that the identifier remains valid.
Then the object could be used as a key in associative containers, and
allow for efficient implementation of directory recursion.

> Question 2: What if neither exist? Only one exists? My initial thought
> is that these are likely to be errors, so treat them as such. It could
> be argued that if either or both don't exist, they can't be
> equivalent, so return false.

I would suggest that the function throw an exception if either file
does not exist. The exception would allow the user to determine
exactly which paths exist or do not exist. Any other behavior, given
that the function can return only true or false, would in some
circumstances give the user less information than desired.

> Question 3: The implementation on Windows (see below) leaves a small
> hole in that duplicated media (such as two CD's) mounted on devices
> with the same device id on two different networked machines would be
> reported as equivalent.

Does Windows actually assign networked devices device ids which are also
used for local devices? If it does, then disregard comments below
about use of device id exclusively.

> POSIX requires that such networked devices have different device id's,
> avoiding the problem. Is the fact that Windows and POSIX
> implementations would perform slightly differently on this corner case
> a showstopper? I think not.

> Windows logic for path equivalent: same device id AND same media
> volume serial number AND same physical location on disk AND same
> creation time. This works even in degenerate cases like camera
> formatted FAT flash memory cards or floppy disks with volume serial
> numbers incorrectly initialized to 0.

Why not use exclusively the device id and ignore the media volume
serial number? Shouldn't that solve the problems? I wouldn't be too
worried about broken device ids, and I don't like the idea of using
hacks like modification time. Before using modification time, it would
be useful to determine if there are versions of Windows that sometimes
give two devices the same device id (this really does seem highly
unlikely).

> POSIX logic: same device id AND same physical location on disk AND
> same modification time. The modification time is in theory redundant,
> but is an added protection in case the device id on networked devices
> failed to meet the POSIX specs.

As with Windows, do you know of any POSIX platforms that sometimes give
two devices the same device id?

Note: the sample code I posted incorrectly used stat(2) instead of
fstat(2) -- fstat should be used to ensure that the file identifier
remains valid, and that the file is not removed, changed, etc.

-- 
Jeremy Maitin-Shepard

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk