Boost logo

Boost :

From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2020-05-07 08:49:34


On 07/05/2020 03:07, Chris Glover wrote:
>
> I cannot say for sure, but it was abandoned at around the same time as
> LLFIO demonstrated to Beman a way of enumerating directory contents,
> with complete stat_t per entry, @ > 4 million entries/sec/core on all
> the major platforms. That makes any notion of caching pointless, just
> enumerate the entire directory, always.
>
> I've also been arguing strenously before WG21 to deprecate
> directory_iterator as fundamentally incorrect ASAP, and I don't think
> I've been unsuccessful. Recent papers to reach WG21 proposing sorely
> needed improvements to directory_iterator have all been shot down. The
> feeling I got in the room was the whole thing needs replacing. My
> current hope for proposing std::directory_handle for standardisation is
> early 2021.
>
> Interesting opinion.
>
> Usually these sorts of things are a series of trade offs; memory vs
> time, latency vs throughput; convenience vs pick-your-favourite-metric,
> so saying once size would fit all is a bit dubious.

It's more fundamental than that. The kernel API which enumerates
directories is quite like reading bytes from a file. Reading a file a
single byte at a time is about the same time as reading lots of bytes at
a time, because the overhead for calling any kernel API is dominant
relative to the operation itself.

> I presume I am using the API correctly, but if not I'm happy to try
> something else.
>
> For reference, here are some rough timings from my test:
> boost::recursive_directory_iterator: ~30seconds.
> FindNextFile: ~13seconds
> llfio: ~980 seconds

I would be extremely surprised with these numbers. It surely must be the
case that you calling the APIs wrong somehow.

Can you send me, off list, an example of the code you are doing so I can
check it?

> This was reading file size and modified date during iteration, which if
> they had been cached in recursive_directory_iterator, probably would
> have made it close in time to FindNextFile, which would be ideal for me. 

On Windows the llfio::directory_entry gets its file size and modified
date filled in, as it comes for free on Windows during directory
enumeration.

Equally, during directory enumeration, you ought to ask only for what
metadata you need. Sometimes LLFIO can use tricks to greatly improve
performance.

Niall


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk