|
Boost : |
From: Trevor Taylor (xju_at_[hidden])
Date: 2003-03-28 04:55:24
(long post)
After some thought, I'll broaden my response by (kind of) developing an
exception use strategy that gives me what I want and addresses (a
little) the "user messages vs debugging" question.
Let's start with a "use-case" model of a system (system might be an
application, server, library - whatever). As background let's say the
use-case model describes the system by describing how it meets a set of
goals. Each use-case shows how the system attempts to meet one goal.
Each use-case describes "success" scenarios (where the goal is met) and
"failure" scenarios (where the goal is not met). The failure scenarios
state what the system does au lieu of meeting the goal.
Ideally, we would enumerate all possible failure scenarios and say how
the system behaves. Sometimes we might let the system try another
strategy to meet the goal, other times we might give up (i.e. system
tells user it can't meet the goal). Let's call these enumared scenarios
"specific" failure scenarios.
In practice it's usually not feasible to enumerate all the cases, so we
have a catch-all "other" failure scenario (with some if not all use cases).
How does the system handle an "other" failure scenario?
Theoretically, the system should do nothing more than involve a person
(this is because the system shouldn't make any assumptions about the
cause of the failure, hence the state of the system, because the state
could be anything). Let's call "involving a person" post-mortem debugging.
The first person to get involved might be a user (who has application
domain knowledge but little else), followed by a support person (who has
application environment and behaviour knowledge but not internals
knowledge), and lastly a developer (who has internals knowledge). (In
practice, some or all of these people might be missing from the chain or
there might be more, but let's just assume these three for now).
To do post-mortem, the system must provide lots of information: what was
the reason for the failure, what was the context of the failure, what
happened before the failure. Different subsets of the information will
be useful depending on the person (our user only knows about domain
concepts, while the developer knows about internal concepts). So to be
useful to all, we need to capture all these levels of information.
How might we capture it?
A core dump is typically very good for the developer, but not so good
for the user. A stack trace might be slightly better for the user, but
not so good for the developer (it will typically omit parameter values
and object states). Neither of these are acceptable in all environments.
"trace" messages might be somewhat useful to the developer and perhaps
useful for the user: they at least show some history (except when
they're turned off :-). Again they're not acceptable in all environments.
A strategy that I've seen work very well is to map failure scenarios to
(C++) exceptions, and collect context information as these exceptions
propogate up the call chain. The context information collected being
function descriptions augmented with parameter values (including *this
state for methods) and file-and-line references (i.e. references back to
the source code).
The software can then use whatever is appropriate to get the information
to the people (in some environments it might not even bother).
How to map this to C++?
Create a class to represent "other" failure scenarios that can collect
context. e.g.
class Exception
{
public:
Exception(string failureReason) throw();
string getFailureReason() throw();
void addContext(string context) throw();
vector<string> getContext() const throw();
};
(We'll come back to this class' limitations below.)
Put each function's implementation in a try/catch block that adds
context. e.g.
void Doc::save(string fileName) throw(Exception)
{
try
{
... implementation details, particularly calling
... functions that only throw Exception or derived
}
catch(Exception& e)
{
e.addContext("save document " +
title_ + " to file " +
fileName);
throw;
}
}
In main() (or elsewhere if you want to gamble and try to recover), catch
Exception and form an error message like:
Failed to *(e.getContext().end()-1) because
failed to *(e.getContext().end()-2) because
:
failed to *(e.getContext().begin()) because
e.getCause().
(Notice that for what I'd call a well designed system, the top lines of
the message will generally mention domain terms. These will gradually
give way to "internals" terms as we go down. So our user might be able
to post-mortem debug using the top few lines (and the failure reason)
while the support and developer will probably be more interested in the
last parts of the message.)
For each "specific" failure scenario, use a separate C++ exception class
that carries all the information needed to take action programmatically.
For robustness and convenience, derive these classes from Exception.
(Robust because if specific failures are added to a low level class
(e.g. library) they are automatically treated as "other" failures by
existing callers; convenient because a caller only needs to catch the
"specific" exceptions they're interested and let their "other" failure
handler catch everything else.
Limitations of the above Exception class:
- Internationalisation
The above example C++ code uses english strings: fine for some
applications, but not all applications certainly not for a library like
boost. How could we make this language neutral while not overly
burdening users who only require one language?
- heap use
The above class uses heap storage to collect the context. What do we do
if we run out of heap while collecting the context? How could we make
the class usable in environments where running out of heap at this stage
is not at option. (In many cases, I'd say a core dump would be
appropriate. But not all cases.)
How can we address these issues (or indeed, adjust the strategy) to
encourage collection of context information without mandating it?
That's it so far - what do you all think?
Trevor
P.S. I'm sorry I didn't reply to individual posts. There seemed to be a
slight divide between wanting "user messages" and "stack traces". I've
kind of tried to bring them together by saying they're all trying to
meet the same goal (post mortem debugging). I've always wanted all this
to happen automatically (and where possible I've tried to convince
people that core dumps are good) but I've been more happy with the
manual solution since it allowed me to make judgements on how much
object state was appropriate to capture (it also allowed me to satisfy
those who just wouldn't come at a core dump).
There are lots of posts everywhere (in this topic; in one referenced by
Thorsten Ottosen; a topic called "improved assertions", around july 2002
- thanks Russel) about getting stack traces. That to me says that it
would be good to put together a library for doing it on all boost
platforms. But stack traces don't really cut it for me (as I've said).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk