Boost logo

Boost :

Subject: Re: [boost] [filesystem] proposal: treat reparse files as regular files
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-07-28 22:06:03


On 28 Jul 2015 at 20:40, Paul Harris wrote:

> I am _disagree_ with the way dedup'd files are currently treated as a
> special file (as if they were a device or a character file or a fifo or a
> socket). device/socket/fifos all need to be read in a special way, but
> dedup'd files should be read as if they were a plain file.
>
> I _disagree_ that a dedup file should be treated as if they are a symlink.
> This is because a dedup file does not point to another file (or inode) on
> the file system, which is a characteristic of a symlink or a hardlink. It
> is basically just a compressed file. We don't treat NTFS-compressed files
> differently from regular files, why are we treating dedup'd files
> differently?

NTFS compressed files act exactly like normal files. Reparse point
files do not and require significant additional processing to figure
out what kind they are. That's the difference.

>From AFIO's perspective, when it does NtQueryDirectoryFile() to fetch
metadata about a file entry, it can zero cost learn if an entry is a
reparse point by examining FileAttributes for the
FILE_ATTRIBUTE_REPARSE_POINT flag. It cannot tell what kind of
reparse point file it is without opening the file and asking.

Windows' CreateFile() API is astonishingly slow. To require calling
that, then an additional NtQueryDirectoryFile() to fetch the
FILE_REPARSE_POINT_INFORMATION metadata and close the handle - which
is the fastest way I know of to fetch the reparse point tag code -
would impose an enormous performance penalty for all file entries
marked with FILE_ATTRIBUTE_REPARSE_POINT.

I appreciate you're saying the cost is worth it, but we're thinking
all Boost users here, not just the small minority on Windows Server
2012 with dedup turned on.

> for (directory_iterator ...)
> {
> if (is_symlink(fn)) backup_link(fn);
> if (is_regular_file(fn)) backup_contents(fn);
> if (is_directory(fn)) ignore(fn);
> if (is_other(fn)) ignore(fn);
> }
>
> Currently, this pseudo code would fail to backup any automatic dedup'd
> files (which are basically any file older than 3 days on some of my sites).
> It fails because a dedup'd file is currently an "other".
>
> If you treat a dedup'd file as a symlink, only the "link" will be backed up.
> This link points to a magical place that is impossible to read other than
> simply reading "fn".
>
> So how does this simple program backup the dedup'd file contents?

I appreciate the problem with saying something is a symlink, but
trying to retrieve the target of that symlink has to error out
because it's meaningless in the case of a dedup symlink.

What seems to me the best route forward is you do something like
this:

if (is_symlink(fn))
{
  error_code ec;
  auto target=read_symlink(fn, ec);
  if(!ec)
    backup_link(fn);
}

Because is_regular_file() and is_directory() use status(), they
follow any symlink so you can safely fall through to those.

Is this acceptable to you? If so, I'll update AFIO accordingly to
match these new semantics and add a note to the docs. I'm sure Beman
will consider something similar when he gets to be less busy.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk