Boost logo

Boost :

Subject: Re: [boost] [filesystem] How to remove specific files from a directory?
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2016-09-13 14:51:32


On 13 Sep 2016 at 7:37, degski wrote:

> > ... Longer answer ...
> >
>
> Thanks for the write-up... It's a shame Windows doesn't do the VMS
> file-shredding though...

It would be hard to implement in NTFS. Each file is stored in a chain
of 64Kb extents. Modifying a segment is a read-copy-update operation
and relinking the chain, so as a file is updated you are basically
leaking bits of data all over the free space list over time.
Therefore shredding on delete is not particularly effective at truly
deleting the file contents on NTFS, and that's why their defrag API
on a cronjob is a much better way of doing it (and I think what the
DoD C2 secure edition does).

I should apologise to the list for yesterday not actually explaining
why deleted files take a while to delete on Windows. All I can say is
it's very busy as Boost Summer of Code winds down and CppCon nears.
It's too easy to brain dump. The historical reason for that behaviour
was explained, but not why it's still done today. The reason is
because NTFS and Windows really does care about your data and forces
a metadata fsync to the journal on the containing directory when you
delete a file entry within it. Obviously this forces a journal write
per file entry deleted, and if you're deleting say 1m file entries
from a directory that would mean 1m fsyncs.

To solve this, Windows actively avoids deleting files if the
filesystem is busy despite that all handles are closed and the file
was marked with the delete on close flag. I've seen up to two seconds
in testing here locally. It'll then do a batch pass of writing a new
MFT record with all the deleted files removed and fsync that, so
instead of 1m fsyncs, there is just one.

Some might ask why not immediately unlink it in RAM as Linux does?
Linux historically really didn't try hard to avoid data loss on
sudden power loss, and even today it uniquely requires programmers to
explicitly call fsync on containing directories in order to achieve
sudden power loss safety. NTFS and Windows tries much harder, and it
tries to always keep what *metadata* the program sees via the kernel
syscalls equal to what is on physical storage (actual file data is a
totally separate matter). It makes programming reliable filesystem
code much easier on Windows than on Linux which was traditionally a
real bear.

(ZFS on FreeBSD interestingly takes a middle approach in between
Windows' and Linux's - it allows a maximum 5 second reordering window
after which writes arrive on physical storage exactly in the order
issued. This lets the program get ahead of storage by up to 30
seconds or so, but because you get a fairly total sequentially
consistent ordering it makes sudden power loss recovery vastly easier
because you only need to scan +/- 5 seconds to recover a valid state)

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk