Boost :

Date view	Thread view	Subject view	Author view

From: Nicholas Neumann (nick2002_at_[hidden])
Date: 2021-03-04 19:32:01

Next message: Andrey Semashev: "Re: boost.log's text_multifile_backend is not performant on windows - perhaps it should be documented?"
Previous message: Peter Dimov: "Re: boost.log's text_multifile_backend is not performant on windows - perhaps it should be documented?"
In reply to: Andrey Semashev: "Re: boost.log's text_multifile_backend is not performant on windows - perhaps it should be documented?"
Next in thread: Andrey Semashev: "Re: boost.log's text_multifile_backend is not performant on windows - perhaps it should be documented?"
Reply: Andrey Semashev: "Re: boost.log's text_multifile_backend is not performant on windows - perhaps it should be documented?"
Reply: Peter Dimov: "Re: boost.log's text_multifile_backend is not performant on windows - perhaps it should be documented?"

On 3/4/21 03:35 AM, Andrey Semashev via Boost wrote:
>> On 3/3/21 11:28 PM, Peter Dimov via Boost wrote:
>> That's probably a Windows Defender (or another antivirus) "feature". Not
that
>> this helps.
>
> It does help. Nicholas, could you verify this by adding the directory
> where log files are written to the excluded ones in the AV software you
> have? Or by temporarily disabling the software?

Nice catch Peter. Simply turning off the built in realtime AV takes me from
throughput on order of 200-300 messages per second to 5700 messages per
second. So better, but still not great. The (imho) sad thing on windows is
that disabling A/V has gotten progressively harder with each windows 10
release, especially for the average user. Realistically it's bad enough
that developers might as well assume it will always be on for end users,
and even fellow developers.

On 3/4/21 03:42 AM, Andrey Semashev via Boost wrote:
> I'm not sure I feel good about documenting it, as it doesn't really
> solve the problem. I suppose, I could add a cache of the least recently
> used files to keep them open, but that cache would be ineffective if
> exceeded, which can happen in some valid cases (e.g. writing logs
> pertaining to a given dynamically generated "id" to a separate log). And
> it would only be useful on Windows.

Just to get a feel for the performance improvement, I quickly implemented
caching all of the destination paths with an unordered_map. Every log
record consume does a flush. Throughput went to about 53000 log records per
second (with or without A/V). Quick performance profiling of that shows
obvious bottlenecks gone at this point - removing the flush will get me to
91K log records per second (with or without A/V). Further rate improvements
could be made by optimizing my formatter or file_name_composer, which are
not the backend's concern.

On 3/4/21 10:39 AM, Andrey Semashev via Boost wrote:
> Unfortunately, text_multifile_backend is supposed to open and close file
> on every log record, as the file name is generated from the log record.

That makes sense. For my use case, I'm using text_multifile_backend to
write to different files based on the channel in the record, and I've got
on the order of 10-20 channels. I could do multiple regular streams with
appropriate filters, but that would require declaring the channels I'm
going to encounter ahead of time. Not having to do that is really nice. :-)

I could see a lot of folks using text_multifile_backend like this (where
there is a reasonable limit on how many distinct paths are actually created
by the backend), where having a cache (with or even without flush) would be
fine. For cases where there is no such limit, it makes less sense. And of
course if every log record is going to a different file, no amount of
caching is going to help with the windows file open/close slowness.

On 3/4/21 10:39 AM, Andrey Semashev via Boost wrote:
> In any case, I've created a ticket to consider adding a cache of open
files:
>
> https://github.com/boostorg/log/issues/142
>
> Also, I've added a note to the docs.

Thanks so much. The note is excellent. The workarounds you mentioned make a
lot of sense... multiple regular streams with filters would work, albeit a
bit more awkward for a use case like mine. And I imagine an asynchronous
frontend would help, although thinking about the CPU cycles wasted is still
a little painful. ;-)

I think an unbounded cache would be a good option for use cases like mine.
Happy to help with any feedback/benchmarking/contributing if it is useful.

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk