Boost logo

Boost :

Subject: Re: [boost] [filesystem] proposal: treat reparse files as regular files
From: Paul Harris (harris.pc_at_[hidden])
Date: 2015-07-27 21:33:25


I think we are not on the same page. Let me try and refocus the
discussion...

With symlinks, there is more than one access point to the same file
content. (ie multiple file names to the identical content).

That makes symlinks fundamentally different to regular files. And it's why
they are treated differently. Eg don't back up content twice.

Is that statement correct?

Reparse point files (that are not junctions or symlinks) do not have an
alternate access point through the file system.

You cannot access the underlying data via another file name. Eg dedup
files.

Is that also correct?

Cheers,
Paul
 On 27 Jul 2015 8:42 pm, "Niall Douglas" <s_sourceforge_at_[hidden]> wrote:

> On 27 Jul 2015 at 10:55, Paul Harris wrote:
>
> > > However, they all still look like symlinks to me. Just because the OS
> > > magically replaces them with the real file on first access is
> > > immaterial - the same thing could happen on Linux. If you don't treat
> > > them as symlinks, there is no way of inspecting the link without
> > > causing it to be auto-downloaded which could be catastrophic in some
> > > use cases.
> > >
> > > I still vote for pseudo-symlinks to be reported by Filesystem as
> > > symlinks.
> > >
> >
> >
> > I did think about that, but the design of these reparse points intends
> for
> > these files to be treated as plain files by the client - as per MS
> > documents.
>
> This is like saying that POSIX symlinks are intended to be treated as
> their target, which is the whole point of using them.
>
> Reparse points are the *technology* by which Microsoft implemented
> symlinks in NTFS. They offer a *family* of symlink implementations,
> all with varying semantics. Some of that family bear strong
> resemblence to the much more limited POSIX symlink, others are quite
> different.
>
> If you weren't on NTFS, the technology used to implement symlinks is
> different. For example, the NT kernel provides its own non-persistent
> symlink implementation totally separate from NTFS.
>
> > Plus, I understand it as: the reparse buffer is entirely driver-specific,
> > and so you can't expect boost or any user program to be able to decode
> what
> > is inside the reparse buffer and do anything intelligent.
>
> Microsoft have published the structure for their reparse tag formats.
> Anyone can parse that structure (AFIO does).
>
> > AND the
> > resolving is done by the driver on the server side. Note that there are
> > probably a dozen products out there that use these reparse buffers for
> > their storage solution... its not just windows dedup.
>
> The resolution varies actually. For example junction points are
> resolved server side, symlinks are resolved client side.
>
> > So, I don't see how the client can't do anything intelligent with symlink
> > knowledge,
> > AND if boost library users are forced to treat them as symlinks, then you
> > now have 2 kinds of symlinks:
> >
> > 1) standard symlink, which you really want a shallow copy sometimes, and
> > you have to be careful of loops ( A -> B -> A )
> >
> > 2) reparse (but not symlink), which you cannot shallow-copy (as far as I
> > understand), and loops are not possible.
>
> You can copy the standard Microsoft reparse points as those are
> documented.
>
> I see no reason why Filesystem's read_symlink(), create_symlink() and
> copy_symlink() all don't work just fine if upgraded to understand
> more reparse point types.
>
> > * My software doesn't want to follow links, but now the new version will
> > force me to specifically check if its just a reparse-file and then
> follow.
>
> No, that depends on whatever the OS does with the symlink. Ordinarily
> I would assume it dereferences the link unless you specifically ask
> for it not to, same as on POSIX i.e. if you lstat() it, it returns
> the stat for the symlink, if you stat() it it returns the stat for
> the target.
>
> > * Whole-disk backup software don't follow symlinks because they assume
> > they'll get the real file later. Reparse (nonsymlink) files do not have
> > any other "real file" so those files are not being backed up at all right
> > now.
> >
> > So treating as a symlink causes more trouble than the helping the one
> edge
> > case.
> >
> > reparse-files-non-symlink is such a specialised case, I'd personally
> want a
> > specialised get_reparse_info kind of function, so if I really need to
> care,
> > then I can find that information.
> >
> > Your thoughts?
>
> I think Filesystem should provide what POSIX provides. Where Windows
> provides close enough to POSIX behaviours we should support that too.
>
> However pages of special Windows support isn't what Boost does
> usually. We're here to abstract out the commonalities generally
> speaking.
>
> I agree Filesystem (and AFIO) should recognise deduped files as
> something valid and can be worked with. Anything past that is up to
> the end user.
>
> Niall
>
> --
> ned Productions Limited Consulting
> http://www.nedproductions.biz/
> http://ie.linkedin.com/in/nialldouglas/
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk