Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2003-09-25 15:05:08


"Gennadiy Rozental" <gennadiy.rozental_at_[hidden]> writes:

>> The execution_monitor and family seem a useful facility. I can see three
>> phases where it can be useful:
>>
>> 1- Development
>> 2- Testing
>> 3- Release (field deployment)
>
> Ok. Let slow down a little a return to the basics, cause I may be
> missing some important points. Could we answer following questions
> in regards to 3 scenarios (under development I mean debugger,while
> testing is standalone test run) above (3 separate answers on all
> questions but first):
>
> 1. Does SEH is an async exception?

A structured exception is "asynchronous" in the sense that it may
occur in response to an event that is not expected to throw, from a
region of code whose correctness relies on the fact that it doesn't
throw. In other words, nobody expects their code to dereference a
NULL pointer in the course of normal operation, so with respect to
program correctness it is as though some other thread or process
decided to "inject" an exception into the program's execution
"asynchronously" (on a hypothetical system which supports such
injection, c.f. Java thread cancellation).

> 2. Does SEH always signal nonrecoverable error?

I'm not certain but I *think* the answer is no, because you can
explicitly raise an SEH. And of course you can get lucky, and have an
SEH raised somewhere that a regular C++ exception could normally be
thrown, in response to a condition whose significance disappears
during unwinding (e.g. a bad non-owning pointer in an object on the
stack which gets destroyed during unwinding anyway).

But fundamentally, unless you're explicitly raising an SEH, it signals
that there's something seriously wrong with your program - something
which may be the result of any unpredictable kind of corruption.
Since you don't know what caused the problem (or surely you'd have
eliminated the bug), you have to assume that it's done some damange
and is not something you can recover from.

[I believe there are specific exceptions to this rule for certain
 kinds of floating point errors, *if* properly and very explicitly
 managed. See the _fpieee_flt documentation. Eric Niebler knows more
 about this than I do].

I can give an example, in fact. I've been using GNU emacs on NT from
the CVS, and it's been crashing on me for months, intermittently. It
crashes deep in emacs' win32-specific display code for BDF fonts.
Well, as it turns out, I've never used a BDF font - I don't even know
what one is. Somewhere the bit which says "this is a BDF font" gets
incorrectly set, and the program goes careening off into never-never
land, executing lots of code before it actually crashes. Now, if
I get this crash under the debugger I can force emacs to "unwind" out
of the problem (by changing the PC), but things are unrecoverably
messed up and the program just crashes again.

Now, you might say, "why not just *try* to continue? You might get
lucky". That's probably a bad strategy for a testing program, because
you also might get _unlucky_. If a bug has caused an SEH (crash), the
rest of the results are unreliable anyway.

> 3. What is wrong in catching? At least with the purpose of reporting and
> invoking usual shutdown mechanisms

a. It causes unwinding code such as destructors to execute before the
   "usual shutdown mechanisms" get a chance to run. That might just
   cause another crash, which might even cause the shutdown mechanisms
   to be bypassed.

b. Perhaps more importantly, it causes unwinding code such as
   destructors to execute before JIT debugging is invoked, which
   interferes with the programmers' ability to inspect the stack trace
   and program state in its condition at the point of the crash.

> 4. How the technique described in Dave A. article helps to resolve a problem
> discussed in items 3?

It causes JIT debugging to be invoked at the point of the crash,
instead of in the outermost catch block which catches and rethrows the
SEH.

> 5. What would be an ideal behavior?

For people running batch regression tests, report the crash *at the
point of the crash* (e.g. in the SE translator), possibly print a
symbolic stack backtrace, and exit.

For people debugging their programs, use the technique in my article
and invoke JIT debugging at the point of the crash.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk