Boost logo

Boost :

Subject: Re: [boost] [log] Patch: rotating file collector
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2013-05-01 08:53:20


Ok, I finally got to this, sorry for the delay.

On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:
>
> > I'm prepared to rename the current file::make_collector method to better
> > describe the current behavior (I'm open to suggestions on the new name).
> > The new collector should be created with a different function, also
> > appropriately named (file::make_stacking_collector?). It's also probably
> > a good idea to extract collectors and the base file::collector interface
> > to separate headers.
> I see 2 options here.
> First, have 2 functions: make_collector, make_rotating_collector.
> Second, have:
> a) enum rotation_method { rotate_sequential, rotate_??? }
> (enum collection_method ?)
> and
> b) pass additional parameter rotation_method to make_collector function
>
> Personally I would prefer second approach.

I would rather prefer different factory functions for different collectors so
that the parameters are independent and have the appropriate meaning for each
collector type.

I would also prefer another name for the new collector since I'm afraid
"rotation" term would become too overloaded since it is used not with regard
to rotating files in the sink backend. "Stacking" collector maybe? "Rolling"
collector?

> > 2. The new collector should also be kept in the collectors repository.
> > This
> > way it is possible to create multiple sinks collecting files in the common
> > target directory, without different collectors interferring with each
> > other. Basically, you'll have to move file_collector_hook and
> > enable_shared_from_this< file_collector > base classes to
> > file_collector_base, as well as m_pRepository.
> >
> > This introduces an additional possible issue if the user tries to create
> > different sinks with different collectors targeting the same directory.
> > This case should be detected (e.g. by checking the type of the collector
> > in file_collector_repository::get_collector) and handle it with an
> > exception (log::setup_error).
>
> The reason I haven't added rotating collector to repository
> is the "possible issue" you are describing:
> several collectors targeting one directory but different files.
> In current implementation,
> it's impossible to have more than 1 collector for one directory.
> It is okay when the collector responsibilities are:
> * files copying to target directory
> * old files removal (according to max size and/or min free space setup)
> ...but does not work for rotating collector case.
> To elaborate more, consider use case:
> * /var/log - is a directory where all the system log files reside
> * foo.log - foo service log file
> * bar.log - bar service log file
> Both foo and bar log files have to be rotated, e. g.:
> foo.log (latest), foo.log.1 (previous), ..., foo.log.N (earliest).
> Thus, user have to setup 2 collectors and will got log::setup_error.
>
> The easiest way to make described use case working -
> do not store rotating collectors in repository.
> The more sophisticated approach would be:
> use additional criteria for collector identification (file_name pattern?).
>
> Please, comment on this.

Well, the rule of the thumb the library follows is that only one process is
managing the target directory. This means:

a. If some other process manages the storage, you just don't set up the
collector and leave it to this process to collect the rotated log files from
the sink. In this case we do nothing in Boost.Log to help it.

b. If Boost.Log is managing the storage, you set up the collector and mostly
rely on the fact that noone else is messing around with the storage. This
means that we don't try to detect sudden file deletions or additions to the
storage, although should that happen the collector should handle the situation
gracefully. This means that if there are two services (foo and bar) running,
they should be collecting their log files to separate directories. This is
fairly common in Linux, at least, since many services would just create
directories within /var/log and put their logs there. If this is not
acceptable, then go for case (a).

Now, in (b) you still have to implement Boost.Log so that it is possible to
create multiple sinks collecting to the same target directory, and this is
solved by the collectors repository. The problem of having different
collectors targeting the same directory can be solved by simply prohibiting
such configuration, because the desired behavior is not clear in this case. So
if you put the new collector to the repository and add a check for its type
before adding, you should be fine.

> > 4. I think, it should be possible to avoid the do_rollover method.
> > file_collector_base::store_file should be moved to file_collector, and
> > there should be a separate rotating_file_collector::store_file method as
> > well. Any common subroutines (like checking available free space, erasing
> > old files, etc.) can be extracted as distinct functions into
> > file_collector_base and called from store_file methods of the
> > derivatives.
>
> I don't see any advantages of eliminating do_rollover method
> in favour of common subroutine methods (as well as vice versa ;) ).
> I don't mind to implement approach you are describing,
> but maybe you can provide additional arguments why do we need this change?

This is a design clarity question. The do_rollover method is specific to the
new collector, and by exposing it to the base class you're leaking these
specifics to the base class, which is supposed to be oblivious to it. Also,
the method is virtual and you already have a virtual table dispatch when
store_file is called. Not that the performance really matters in this case,
but in general this counts as a design flaw to me.

> > 5. You don't seem to rename the collected files when the file pattern
> > contains date/time (see date_rollover_policy). Is that right? Obviously,
> > you will only have the date and time point of the rotation and not the
> > file creation.
> You're right that collected files are not renamed for date/time patterns.
> I don't have any strong opinion about whether file name should indicate
> file creation or file rotation time.
> However, gut feeling inclines me to the second case (file rotation time).

I think, most, if not all systems I had experience with (including proprietary
developments) put file creation time in the file name. And this is quite
expected, I think. I mean, if you're looking for a file that might contain
logs for a specific time point, you would typically assume the file name to
indicate the beginning of the time frame and not the end of it.

Of course, we could put both time stamps into the file name, but that's
another story.

> On the opposite, I have following real life use case.
> User logins to system and wants to monitor events for day 23.
> There are 2 log files in the log directory: file.log and
> file.log.2013-04-21. If filename indicates creation time, then it's hard to
> guess
> what file to look at: records could be found in both files.
> If 2013-04-21 indicates rotation time,
> then it's clear that the record for day 23 is in file.log.
>
> It will be interesting to hear opinions from you.

If I have a login date (20013-04-23), I would look into file.log.2013-04-21
first. Call it whatever you like, but my intuition tells me that the file name
indicates the date of the file creation. :) Also, the last modification time
might give you a hint on when the file ends, although it depends on your file
manager how apparent this attribute is.

You can avoid the confusion if you name the active file with its creation time
as well (which is what Boost.Log does, currently). So if you have
file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which
file contains the records you seek.

> >> Also, I would like to discuss following question with you :
> >> > rotate_file invocation in ~text_file_backend -
> >> > I would prefer to have this call during text_file_backend construction
> >> > phase.
> >>
> >> I'll have to take a closer look later to answer that but it seems odd to
> >> have to rotate something when nothing have been written yet. The
> >> rotate_file call is intended to place the written file to the storage
> >> governed by the collector, and at construction this file hasn't been
> >> created yet. Could you describe why you need this?
> >
> > This question remains actual.
>
> I'll try to explain why I'm requesting this.
> Point 1 - when rotating collector is used,
> file name that is currently being written is somewhat constant.
> But when application shut down, it becames something like
> file_name.log + ".1" (counter name pattern case).

This behavior is chosen to guarantee there are no files piling up in the
directory where the sink backend writes them. Imagine there are timestamps in
the file name, so the next time you start your application the file names will
be different and the old files will be left unmanaged. So the files are
collected when the sink is destroyed and then managed by the collector after
the application restarts and runs scan_for_files.

Another problem can happen even with file counters only, if file appending is
not enabled (the default). After scan_for_files doesn't find the last file's
counter, the sink will reconstruct the file name that is the same as it was in
the previous run and will overwrite the old file. This kind of gotcha I would
like to avoid.

> I would like to have file name the same for both cases:
> when service is running and when not running.
> Also, if user collects log files to e.g. /var/log/archive folder but
> the log file itself resides in /var/log directory,
> then latest application log should be seeked in 2 place which is annoying.

Well, right now I don't have ideas on how to support this without interferring
with my previous points. You see, when file collectors are used, the file that
the sink writes is a temporary, really.

Actually, I think if you open it on Windows while the sink attempts to rotate
it, you can screw up the rotation and loose some records while the sink tries
to recreate the file. I haven't tried this but if I'm right then you're
basically not recommended to open the file while it's being written.

> Point 2 - user changed collector configuration (decreased min free space).
> After starting application he would expect that some files will be removed,
> but this will occur only on next rotation phase.
> So, I would re-phrase:
> * it's preferable to *try* rotate_file
> during text_file_backend construction phase

You're not talking about file rotation here but rather about checking the
target storage constraints when initializing the sink. This might be
worthwhile although, as I said before, we don't expect someone else doing
something with the storage behind the scene. Besides, the next time a file is
stored, the collector will check the constraints anyway _before_ storing the
file, so the additional check on the sink construction isn't really needed.

> * there is no need to rotate_file during text_file_backend destruction

See my previous points.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk