Boost logo

Boost :

From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2020-05-10 22:02:30


On 08/05/2020 03:06, Chris Glover via Boost wrote:

> Sure, here's a small repo with the same algorithm implemented three
> different ways: FindNextFile, recursive_directory_iterator,
> llfio::directory_handle. All they do is categorise every file on C:\, so
> you should be able to change "C:\\" to "/" for other systems.
>
> I lifted these from a larger system so they're doing some extra work and
> probably have bugs, but should be close enough.
>
> https://github.com/cdglove/fs-test

So this turned into a weekend long thing to figure out, and I thank you
for it, as we have a whole bunch of code here which assumes things which
it turns out are out of date on recent Windows. FYI I had to break out
the Windows Performance Recorder to figure out what the hell was going
on in the Windows kernel, more on that shortly.

Quick results for enumerating all of C:\ on my machine, warm cache:

```
G:\boostish\cdglove>x64\Release\llfio.exe
test-llfio found 1485232 files.
 13.872090s wall, 1.734375s user + 12.109375s system = 13.843750s CPU
(99.8%)

G:\boostish\cdglove>x64\Release\win32.exe
test-win32 found 1491222 files.
 14.713440s wall, 2.328125s user + 12.375000s system = 14.703125s CPU
(99.9%)

G:\boostish\cdglove>x64\Release\boost.exe
Exception thrown: filesystem::recursive_directory_iterator increment
error: The system cannot find the path specified
Up until fatal error test-boost_filesystem found 868657 files.
 66.506680s wall, 5.187500s user + 61.234375s system = 66.421875s CPU
(99.9%)

G:\boostish\cdglove>x64\Release\std.exe
Exception thrown: recursive_directory_iterator::operator++: The system
cannot find the path specified.
Up until fatal error test-std_filesystem found 870341 files.
 63.407547s wall, 2.921875s user + 60.375000s system = 63.296875s CPU
(99.8%)
```

I attach, as a ZIP archive, the code which made the programs above. I
annotated what I mainly changed in the LLFIO test to explain what you
were doing wrong. In case the ZIP archive doesn't come through, I also
uploaded it to
https://gist.github.com/ned14/9ed4b583b3064ad9f8bbb8e9fa2408b6

You will note that LLFIO is basically the same speed as Win32
FindFirstFile(), except that LLFIO gives you race free snapshots, which
is a very important guarantee under concurrent filesystem modification.
Both Boost and std's recursive_directory_iterator are orders of
magnitude slower, which cannot be helped due to their design.

I am very sure, by the way, that LLFIO can enumerate the entire disc
within five seconds on Windows, if you use a very different approach to
yours. See below.

---
To give a bit of the bigger picture, stuff seems to have changed in
Windows in the last few years.
Firstly, earlier Windows did not have a fast FindFirstFile()
implementation. LLFIO used to beat it by 2x or 3x using the same
technique which Windows Explorer used, because FindFirstFile() was that
slow, so Windows Explorer went straight to the NT kernel, as LLFIO does.
Now FindFirstFile() is competitive, which is a well done to Microsoft
for fixing their stuff.
Secondly, recent Windows 10 seems to have a much, much, much slower
implementation of ProbeForWrite() than it used to. Like, quadratic
complexity to input buffer size. LLFIO used to be super fast because
we'd feed the kernel a very large kernel buffer to fill, and it would
never exceed it. Unfortunately, ProbeForWrite() is now so slow that
large kernel buffers means ProbeForWrite() running for 90% of the time,
so that strategy no longer works.
I did some trial and error, and LLFIO now has a completely new
directory_handle::read() implementation. The new one iterates the
directory once, in pieces using a small kernel buffer, to calculate the
minimum size of kernel buffer needed for a snapshot. It then iterates
the directory a second time, as a snapshot. This ensures that the kernel
buffer is always minimally sized, and we still get our all important
race free snapshot semantic.
Despite performing 5x-6x more syscalls per directory enumeration, this
strategy is actually much better than the previous one, because it keeps
ProbeForWrite() from slowing everything down any more than is absolutely
required. And it's no slower than FindFirstFile(), but with much
stronger guarantees and full stat_t metadata returned per entry.
---
Back to how to achieve five second whole disc enumerations, the first
thing is stop performing a unicode transcode per file name. They are
really horrible on the CPU and its caches. Use a small buffer
optimisation to remove the dynamic memory allocation, for most file
names, and just memcpy() the filename.
The second thing is you should throw kernel threads at the problem.
Filesystems will embarrassingly parallelise up to hardware concurrency,
just make sure to never touch the same inode from more than one kernel
thread at a time.
Given one can enumerate the entire disc using a single kernel thread
within 14 seconds or so, with a well written breadth-first concurrent
algorithm, I see no reason at all why under 5 seconds for the entire
disc isn't very possible.
I honestly can no longer recommend LLFIO over Win32 any more for
directory enumeration, on Windows at least. I can still heartily
recommend LLFIO over the libc on POSIX, which we still beat handily.
Thanks for reporting this issue, and providing a repro, it was very
valuable.
Niall



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk