Boost logo

Boost :

Subject: Re: [boost] [filesystem] How to remove specific files from a directory?
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2016-09-14 12:16:54


On 13 Sep 2016 at 22:18, Andrey Semashev wrote:

> > Some might ask why not immediately unlink it in RAM as Linux does?
> > Linux historically really didn't try hard to avoid data loss on
> > sudden power loss, and even today it uniquely requires programmers to
> > explicitly call fsync on containing directories in order to achieve
> > sudden power loss safety. NTFS and Windows tries much harder, and it
> > tries to always keep what *metadata* the program sees via the kernel
> > syscalls equal to what is on physical storage (actual file data is a
> > totally separate matter). It makes programming reliable filesystem
> > code much easier on Windows than on Linux which was traditionally a
> > real bear.
>
> I'm not sure I understand how Windows behavior you described provides
> better protection against power loss. If the power is lost before
> metadata is flushed to media then the file stays present after reboot.
> The same happens in Linux, AFAICT, only you can influence the FS
> behavior with mount options.

You're thinking in terms of "potential loss of user data", and in
that sense you're right.

I'm referring to "writing multi-process concurrent filesystem code
which is algorithmically correct and won't lose data". In this
situation having the kernel only tell you what is actually physically
on disk makes life much easier when writing correct code. In Linux in
particular you have to spam fsync all over the place and pray the
user hasn't set "journal=writeback" or barriers off etc, and also
such design patterns are inefficient as you end up doing too many
directory fsyncs.

During AFIO v1 I used to get very annoyed that metadata views on
Windows from other processes did not match the modifying process'
view until the updates reach physical storage, so process A could
extend a file and process B wouldn't see the extension for
potentially many seconds later (same goes for hard links, timestamps
etc). It seemed easier if every process saw the same thing and had a
sequentially consistent view. But with the benefit of getting used to
it, and also the fact that Linux (+ ext4) would appear to be the
exceptional outlier here, it does have a compelling logic and it
definitely can be put to very good use when writing algorithmically
correct filesystem code.

> The irritating difference is that even though the file is deleted (by
> all means the application has to observe that), the OS still doesn't
> allow to delete the containing folder because it's not empty.

Ah but the file is not deleted, so refusing to delete the containing
folder is correct. It is "pending deletion" which means anything
still using it can continue to do so, but nothing new can use it [1].
You can, of course, also unmark a file marked for deletion in
Windows. Linux has a similar feature by letting you create a file
entry to an anonymous inode.

[1]: Also an opt-out Windows behaviour.

> I'm seeing
> this effect nearly every time I boot into Windows - when I delete the
> bin.v2 directory created by Boost.Build. There may be historical reasons
> to it, but seriously, if the OS tries to cheat and pretends the file is
> deleted then it should go the whole way and act as if it is.

Are you referring to Windows Explorer hiding stuff you delete with it
when it's not really deleted?

That's a relatively recent addition to Windows Explorer. It's very
annoying.

> Workarounds
> like rename+delete are a sorry excuse because it's really difficult to
> say where to rename the file in presence of reparse points, quotas and
> permissions. And most importantly - why should one jump through these
> hoops in one specific case, on Windows? The same goes about inability to
> delete/move open files.

You can delete, rename and move open files just fine on Windows.
Indeed an AFIO v1 unit test fires up a thread randomly renaming a few
dozen files and directories and then ensure that a loop of filesystem
operations on a rapidly changing filesystem does not race nor
misoperate.

You are correct that you must opt-in to being able to do this. The
Windows kernel folk correctly observed most programmers, even
otherwise expert ones, consistently write unsafe filesystem code.
They therefore defaulted to an abundance of defaulted options to
safety (and I would agree too much so, especially making symbolic
links effectively an unusable feature).

Regarding an ideally efficient way of correctly deleting a directory
tree on Windows, AFIO v1 had an internal algorithm which when faced
with pending delete files during a directory tree deletion, it would
probe around for suitable locations to rename them to in order to
scrub the directory tree immediately. It was pretty effective
especially if %TEMP% is on the same volume, and the NT kernel API
makes figuring out what's also on your volume trivial as compared to
say statfs() on Linux which is awful. AFIO v2 will at some point
expose that algorithm as a generic templated edition into
afio::algorithm so anybody can use it.

In the end, these platform specific differences are indeed annoying.
But that's the whole point of system libraries and abstraction
libraries like many of those in Boost, you write code once and it
works equally everywhere.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk