Boost logo

Boost :

From: Chris Glover (c.d.glover_at_[hidden])
Date: 2020-05-07 02:07:27


On Tue, May 5, 2020 at 9:59 AM Niall Douglas via Boost <
boost_at_[hidden]> wrote:

>
> I cannot say for sure, but it was abandoned at around the same time as
> LLFIO demonstrated to Beman a way of enumerating directory contents,
> with complete stat_t per entry, @ > 4 million entries/sec/core on all
> the major platforms. That makes any notion of caching pointless, just
> enumerate the entire directory, always.
>
> I've also been arguing strenously before WG21 to deprecate
> directory_iterator as fundamentally incorrect ASAP, and I don't think
> I've been unsuccessful. Recent papers to reach WG21 proposing sorely
> needed improvements to directory_iterator have all been shot down. The
> feeling I got in the room was the whole thing needs replacing. My
> current hope for proposing std::directory_handle for standardisation is
> early 2021.
>
> Niall
>

Interesting opinion.

Usually these sorts of things are a series of trade offs; memory vs time,
latency vs throughput; convenience vs pick-your-favourite-metric, so saying
once size would fit all is a bit dubious.

Nonetheless, I looked at your library and thought it might give me exactly
what I want because the API allows me to spend memory to save time.

But it turned out to be really slow when recursing. This makes sense
because it's generating many small queries which, because it's calling a
low level API, the OS is unable to help with.

Here's a callstack from WPA trace with the hot path.

Microsoft Windows Profiler
Line # Process Thread ID Stack Count Weight (in view) (ms)
12 | |- test.exe!llfio_v2_62985a1f::directory_handle::read 163634
19,004.423700
13 | | |- ntdll.dll!ZwQueryDirectoryFile 161368 18,737.389200
14 | | | |- ntoskrnl.exe!KiSystemServiceCopyEnd 161349
18,735.136900
15 | | | | |- ntoskrnl.exe!NtQueryDirectoryFile 161346
18,734.804000
16 | | | | | ntoskrnl.exe!NtQueryDirectoryFileEx 161346
18,734.804000
17 | | | | | |- ntoskrnl.exe!BuildQueryDirectoryIrp 160107
18,590.971100
18 | | | | | | |- ntoskrnl.exe!ProbeForWrite 160073
18,587.073100
19 | | | | | | | |- ntoskrnl.exe!KiPageFault 122698
14,246.119500
20 | | | | | | | | |- ntoskrnl.exe!MmAccessFault
79304 9,208.701200
21 | | | | | | | | | |-
ntoskrnl.exe!MiDispatchFault 55441 6,439.250700
22 | | | | | | | | | | |-
ntoskrnl.exe!MiResolveDemandZeroFault 51240 5,950.194000
23 | | | | | | | | | | | |-
ntoskrnl.exe!MiResolvePrivateZeroFault 48839 5,670.213100
24 | | | | | | | | | | | | |-
ntoskrnl.exe!MiCompletePrivateZeroFault 25331 2,938.969600
25 | | | | | | | | | | | | | |-
ntoskrnl.exe!MiCompletePrivateZeroFault<itself> 13842 1,606.017700

I presume I am using the API correctly, but if not I'm happy to try
something else.

For reference, here are some rough timings from my test:
boost::recursive_directory_iterator: ~30seconds.
FindNextFile: ~13seconds
llfio: ~980 seconds

This was reading file size and modified date during iteration, which if
they had been cached in recursive_directory_iterator, probably would have
made it close in time to FindNextFile, which would be ideal for me.

-- chris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk