Subject: Re: [boost] [filesystem] proposal: treat reparse files as regular files
From: Paul Harris (harris.pc_at_[hidden])
Date: 2015-07-28 08:40:53
On 28 July 2015 at 19:07, Andrey Semashev <andrey.semashev_at_[hidden]> wrote:
> On 28.07.2015 04:33, Paul Harris wrote:
>> I think we are not on the same page. Let me try and refocus the
>> With symlinks, there is more than one access point to the same file
>> content. (ie multiple file names to the identical content).
>> That makes symlinks fundamentally different to regular files. And it's why
>> they are treated differently. Eg don't back up content twice.
>> Is that statement correct?
> As Niall already commented, that's not correct. What you described is more
> like a hardlink .
> You can easily spot the difference if you rename or delete the file the
> link points to. The symlink will keep pointing to the old file (thus being
> a dangling symlink) while the hardlink will still be pointing to the file
> A hardlink is actually not any more special than a regular file. Put
> simply, from the filesystem perspective any file is a name pointing to the
> content. When you create a new file, there's only one such name. When you
> create a hardlink, you create another name pointing to the same content and
> increment the reference count to the content. The two names are equivalent,
> and the content exists as long as there are names referencing it.
>  https://en.wikipedia.org/wiki/Hard_link
I think my point is being missed... I am not debating symlinks or
I am _happy_ with the way hardlinks and symlinks are treated, in both posix
I am _happy_ with the way reparse-based-symlinks and junctions are treated
I am _disagree_ with the way dedup'd files are currently treated as a
special file (as if they were a device or a character file or a fifo or a
socket). device/socket/fifos all need to be read in a special way, but
dedup'd files should be read as if they were a plain file.
I _disagree_ that a dedup file should be treated as if they are a symlink.
This is because a dedup file does not point to another file (or inode) on
the file system, which is a characteristic of a symlink or a hardlink. It
is basically just a compressed file. We don't treat NTFS-compressed files
differently from regular files, why are we treating dedup'd files
Dedup files and symlink files on windows both (unfortunately) use the same
mechanism - reparse points. But we should only treat symlink and junction
reparse point files as symlinks. Anything else should be treated as a
regular file. That is how I am reading the MS docs, and that is how I am
experiencing working with the filesystems.
Simple example is when building a backup program for files
in a _single directory_.
Lets say you want to store every file's content once.
When you find a directory, ignore it.
When you find an "other" file, ignore it (how can you backup a device /
character file / etc?)
When you find a symlink, you want to store just the link.
When you find a regular file, you want to store the contents.
When you find a reparse-point-symlink, you want to store just the link
(like a posix symlink).
When you find a dedup'd file, you want to store the contents (like a posix
for (directory_iterator ...)
if (is_symlink(fn)) backup_link(fn);
if (is_regular_file(fn)) backup_contents(fn);
if (is_directory(fn)) ignore(fn);
if (is_other(fn)) ignore(fn);
Currently, this pseudo code would fail to backup any automatic dedup'd
files (which are basically any file older than 3 days on some of my sites).
It fails because a dedup'd file is currently an "other".
If you treat a dedup'd file as a symlink, only the "link" will be backed up.
This link points to a magical place that is impossible to read other than
simply reading "fn".
So how does this simple program backup the dedup'd file contents?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk