Subject: Re: [boost] Looking for thoughts on a new smart pointer:shared_ptr_nonnull
From: Gavin Lambert (gavinl_at_[hidden])
Date: 2013-10-09 19:23:56
On 10/10/2013 7:58 AM, Quoth Nevin Liber:
> When I say one should assert on a detected precondition failure, it's
> really shorthand for putting the system into a known state. A mission
> critical system, for instance, might do something vastly more complicated
> than aborting.
> I just don't see how throwing, which normally involves running code
> dependent on state, accomplishes this.
Throwing unwinds out of the current call stack to some higher level
point of code that has declared that it is prepared to handle errors in
some meaningful way.
Imagine a server process which has a task loop to process a list of
pending operations (each of which are completely independent, or can
gracefully fail if they depend on some previous operation that failed --
eg. a webserver or database). Inside the loop it has a try/catch around
the call to actually process the operation.
Somewhere in the depths of the operation processing, it hits a code path
with a bug and a null pointer is encountered where the code did not
adequately check for it. One of these things can happen:
1. None of the code paths check for it and the null pointer is actually
dereferenced. Depending on the platform, this may immediately terminate
the server (bad), perform dodgy things to memory (extremely bad), or
throw an exception (which will terminate the server if uncaught, or
offer some possible means for recovery). Tracing a callstack from this
exception would reveal the place where the pointer is used, which is
*not* the site of the actual code bug. (Although it does allow you to
guess that in most cases.)
2. The assignment to the "this pointer is never null" pointer did
actually check that it wasn't null and throws an exception. This
eliminates the possibility of dereferencing this pointer later (since it
never really existed), and tracing the callstack reveals the place where
the pointer is constructed, which *is* the site of the bug.
If we assume that an exception was thrown, then the catch handler in the
operation loop can mark that operation as failed (and intervening catch
handlers or destructors could have rolled back anything they needed to).
Subsequently it would be able to continue running operations.
Obviously any dependent on the failed operation would also fail, but
that can be dealt with too.
As for what happens next, it depends on where the bug actually was and
how critical that null pointer is. Maybe it was a temporary failure in
some local state, so would not affect any other operations at all -- in
this case taking down the entire server is obviously overkill. Or maybe
it was a more serious failure in shared state and all (or most) future
operations would also fail. In this case it should become apparent
quickly to the user and they should be able to do a controlled restart
of the server and contact someone to fix the underlying problem. Either
way there should be a log of why the operation failed that will help to
discover the source of the bug.
And note that stack unwinding all the way up is still well defined at
this point -- some people have argued that "oh, you're already in an
undefined state because you've violated the precondition!" but that's
circular logic. If you check for it and throw, it's no longer a
precondition, and so nothing has been violated and no UB has occurred
yet. Throwing prevents UB from occurring later as a result of
"successfully" constructing an object with an invalid invariant.
I'm not saying that preconditions are bad. Everything has an implicit
precondition of "no other code has trampled over my memory" and "the CPU
and RAM actually work properly" after all. But a precondition
represents a case where invalid input is not tested, which allows bugs
to propagate. Sometimes this is not avoidable (as I said in an earlier
post, it's essentially impossible to verify that a pointer isn't
dangling). But a null check is quick and easy. Leaving specific inputs
as undefined behaviour, when it is cheap to specify that behaviour, is
just being lazy.
Obviously throwing and catching exceptions doesn't magically fix bugs in
the code. (And if done badly, it can make things significantly worse as
bugs that have already caused damage to shared state are ignored.) But
as long as you do it properly, it does allow you to limit the damage
caused by bugs in the code if the "this is possible but not valid, and
would leave my object in an unhinged state if I accepted it" cases were
allowed to continue unchecked.
I'm all for early termination when the developer is around to sort it
out immediately. But that's easy too -- just configure a debugger to
intercept exceptions immediately when thrown. But instant termination
is not a good user experience.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk