Boost logo

Boost :

From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2021-03-04 10:27:47


On 04/03/2021 09:41, Andrey Semashev via Boost wrote:
> On 3/3/21 11:00 PM, Nicholas Neumann via Boost wrote:
>> I recently moved a project to boost.log from a homemade logger. I had
>> something like text_multifile_backend, so finding that as a drop-in
>> replacement was awesome.
>>
>> Unfortunately, the performance when using text_multifile_backend on
>> windows
>> is really bad, because the repeated file close operations (one per log
>> record) are unusually slow on windows. Repeatedly logging a string to the
>> same file via text_multifile_backend results in throughput of about
>> 200 log
>> entries per second.
>>
>> Just to quickly prove it is unique to windows, I made a simple program
>> that
>> just opens, appends a single line, and then closes, an ofstream in a
>> loop.
>> On a high-end windows machine with nvme ssd, 1000 iterations takes
>> 2600ms.
>> On an older linux box with a sata ssd, the same takes 16ms.

Opening a file for read/write on Windows using the NT kernel API is
approx 76x slower than opening a file on Linux. Using the Win32 API is
approx 40% slower again (~106x slower). That's without any antivirus.

You're seeing things slower than that again, which is almost certainly
due to the file handle close. On Linux this doesn't do work, whereas on
Windows it induces a blocking metadata flush plus flush of the
containing directory i.e. an fsync of metadata.

Windows is competitive to Linux for file i/o, once the file is opened.
It is highly uncompetitive for file open/close. This is why compiling a
large codebase is always much slower on Windows than elsewhere, because
compiling C++ involves opening an awful lot of files for read.

>> What do others think about adding a note in the documentation about this
>> performance issue? It's bad enough that I think anyone on windows would
>> want to avoid the backend. It's not the backend's "fault" at all; I could
>> see some options for improving performance of the backend on windows, but
>> they definitely complicate the simplicity of the current approach.
>
> I'm not sure I feel good about documenting it, as it doesn't really
> solve the problem. I suppose, I could add a cache of the least recently
> used files to keep them open, but that cache would be ineffective if
> exceeded, which can happen in some valid cases (e.g. writing logs
> pertaining to a given dynamically generated "id" to a separate log). And
> it would only be useful on Windows.

Windows, unlike POSIX, has no low soft limit on total open file
descriptors such that you need to care about fire and forget HANDLE
allocation. You can open at least 16 million files on Windows per
process without issue.

Just make sure that when opening the HANDLE you do not exclude other
programs also opening the file, or deleting or renaming the file. Be
aware that so long as the file is open, any directory in the hierarchy
above is locked, and cannot be renamed. Be aware if you map the file,
other programs will be denied many operations, such as shrinking the
file, which does not occur on POSIX.

A lot of people coming from POSIX don't realise this, but on Windows
opening a file with more permissions than you actually need is
expensive. For example, if you only need to atomically append to a file,
opening that file for append-only, with no ability to read nor write nor
query metadata, is much quicker than opening it with additional privileges.

If you don't mind using NtCreateFile() instead of Win32 CreateFile(),
that's 40% quicker, as you save on the dynamic memory allocation and
Unicode path reencode all the Win32 path APIs do.

In our work custom DB in which every new query opens several files on
the filesystem, on Windows it is many times slower than on Linux.
However, overall benchmarks are within 15% of Linux, because the hideous
high cost file open/close gets drowned out by other operations. We also
heavily cache open file handles on all platforms (after raising the soft
fd limit on POSIX to 1 million), so we avoid file open/close as much as
we can, which helps Windows particularly.

(All number claims above come from LLFIO which makes the Windows
filesystem about twice as fast, and should be considered anecdata)

Niall


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk