Boost logo

Boost :

From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2021-03-04 16:39:13


On 3/4/21 1:27 PM, Niall Douglas via Boost wrote:
>
> Opening a file for read/write on Windows using the NT kernel API is
> approx 76x slower than opening a file on Linux. Using the Win32 API is
> approx 40% slower again (~106x slower). That's without any antivirus.
>
> You're seeing things slower than that again, which is almost certainly
> due to the file handle close. On Linux this doesn't do work, whereas on
> Windows it induces a blocking metadata flush plus flush of the
> containing directory i.e. an fsync of metadata.
>
> Windows is competitive to Linux for file i/o, once the file is opened.
> It is highly uncompetitive for file open/close. This is why compiling a
> large codebase is always much slower on Windows than elsewhere, because
> compiling C++ involves opening an awful lot of files for read.

Unfortunately, text_multifile_backend is supposed to open and close file
on every log record, as the file name is generated from the log record.

> Windows, unlike POSIX, has no low soft limit on total open file
> descriptors such that you need to care about fire and forget HANDLE
> allocation. You can open at least 16 million files on Windows per
> process without issue.

I'm sure there are some resources associated with the file handle in the
kernel, so I definitely wouldn't want to have millions of open files in
the logging library.

> Just make sure that when opening the HANDLE you do not exclude other
> programs also opening the file, or deleting or renaming the file. Be
> aware that so long as the file is open, any directory in the hierarchy
> above is locked, and cannot be renamed. Be aware if you map the file,
> other programs will be denied many operations, such as shrinking the
> file, which does not occur on POSIX.
>
> A lot of people coming from POSIX don't realise this, but on Windows
> opening a file with more permissions than you actually need is
> expensive. For example, if you only need to atomically append to a file,
> opening that file for append-only, with no ability to read nor write nor
> query metadata, is much quicker than opening it with additional privileges.

I see, thanks for the information.

> If you don't mind using NtCreateFile() instead of Win32 CreateFile(),
> that's 40% quicker, as you save on the dynamic memory allocation and
> Unicode path reencode all the Win32 path APIs do.

I'm currently using C++ ofstream. I suppose, I could use a lower level
API if there were compelling benefits, but NtCreateFile in particular
seems to be too low level, as it is considered Windows kernel internals
and can be changed or removed in the future[1].

Since text_multifile_backend is writing text, I would also have to
implement newline character translation, as I believe, WinAPI doesn't do
that.

> In our work custom DB in which every new query opens several files on
> the filesystem, on Windows it is many times slower than on Linux.
> However, overall benchmarks are within 15% of Linux, because the hideous
> high cost file open/close gets drowned out by other operations. We also
> heavily cache open file handles on all platforms (after raising the soft
> fd limit on POSIX to 1 million), so we avoid file open/close as much as
> we can, which helps Windows particularly.
>
> (All number claims above come from LLFIO which makes the Windows
> filesystem about twice as fast, and should be considered anecdata)

Thanks for all the information you provided, Niall.

In any case, I've created a ticket to consider adding a cache of open files:

https://github.com/boostorg/log/issues/142

Also, I've added a note to the docs.

[1]:
https://docs.microsoft.com/en-us/windows/win32/devnotes/calling-internal-apis


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk