Boost logo

Boost :

Subject: Re: [boost] [AFIO] Review (or lack of it)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-08-29 10:29:32


On 29 Aug 2015 at 15:36, Andrey Semashev wrote:

> > If AFIO is fatal exiting the process, you are in an unrecoverable
> > cannot continue situation. I mainly have those traps in there for
> > handling memory corruption which is not uncommon when writing file
> > system applications. They are *extremely* useful for catching
> > inadvertent memory corruption.
> >
> > And they should never trip unless memory corruption has occurred or
> > you have a serious error in how you are using AFIO.
>
> What kind of memory corruption do you mean and how do you detect it?

If an impossible situation has occurred i.e. if one bit of data in
location A says one thing and another bit of data in location B says
the opposite, it is assumed to be memory corruption. It should never,
ever occur in any situation other than memory corruption.

> Also how terminating the process helps in fixing those errors?

If memory corruption has occurred, then known program state has been
lost and the loss of other people's data is a high likelihood.
Terminating the process is the only sane thing to do now.

> > Fatal exit happens when AFIO cannot continue without losing or
> > corrupting other people's data. I make no apology for that - it is
> > the right thing to do.
>
> Why is this even possible with the library?

I always code in detection of impossible states and fatal exit in any
of my code and I have done so for years. It has proven its worth time
and time again. I obviously don't go overboard with O(N^2) proof of
good state checks in release builds, but if detection of impossible
states has a low overhead, it's worth doing as a matter of routine.

Think of it being a manually written undefined behaviour sanitiser.
It's no different.

> And why not report an error in a usual way?

To be honest it's never come up. In use cases without memory
corruption you never see this code in action. On the other hand, when
someone logs a bug which mentions all this stuff printed to
std::cerr, it's an excellent sign they have memory corruption
problems.

> > Deadlocks are handled by continuing to deadlock in debug builds,
> > though with plenty of help and backtraces etc printed to std::cerr.
> > This lets you interrupt in a debugger and try and figure out the
> > cause. In release builds, after a reasonably long timeout you get a
> > terse single line message to std::cerr and it breaks the deadlock and
> > mops up the orphaned state as best it can before continuing.
>
> In production there are release builds, not debug. We don't even use debug
> builds while testing because they don't really add anything but the
> possibility for the error to slip unnoticed in release builds. You may say
> this is an unusual policy but an least at two jobs I worked at practiced this.
>
> Altering the state before exiting is also something that doesn't look right to
> me. Having the latest state of the crashed process is sometimes helpful in
> debugging. Along with console output and unexpected program termination this
> is one of the "never do that" things.

My rationale is that the deadlock detection occurs in a destructor
and that is almost always being called in a process shutdown
scenario. I figured a default timeout and mop up wasn't a bad choice
as I can't throw an exception.

The deadlock printing code is already macro enabled but for other
reasons. I have logged an issue to add an explicit macro at
https://github.com/BoostGSoC13/boost.afio/issues/108.

> I think you're trying to be too smart in your library. Don't. You will never
> cover everything users want. There will always be someone like me who wants a
> different behavior than you envisioned.

I am not being too smart in my library. I am choosing sensible
defaults which are useful in most use cases to most people. All this
is really quality of implementation detail anyway.

Issues have been logged for controlling the printing to std::cerr and
enabling or disabling the deadlock debug printing code. If at a later
stage people find they need additional behaviours in their real world
use cases I have no problem adding solutions for them as well.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk