|
Boost : |
From: Beman Dawes (bdawes_at_[hidden])
Date: 2002-08-03 16:05:04
At 12:23 AM 8/3/2002, David Abrahams wrote:
>From: "Beman Dawes" <bdawes_at_[hidden]>
>
>> Again, the function remove_all() is provided to meet the need for a
>> function which does not throw an exception if the target doesn't exist.
>>
>> It seems to me people are advocating something that is already
provided.
>Am
>> I missing something?
>
>Yes. Using the other function is dangerous. It provides a false sense of
>security by working in all testcases, then throws an exception in the
field
>when some other process gets in and deletes the file before we get a
>chance.
My experience (and I'm talking a lot of years here working on systems with
a lot of money riding on them) is that remove() errors do show up a great
deal in testing, particularly early in a new system's life-cycle.
It is definitely a pain to analyze these during programming, or during
early testing if the analysis wasn't done during programming, and figure
out which are serious.
The payoff comes later, when the system goes into production in the field
and there is a serious screw up, like the one you describe where someone
runs another process in a directory they shouldn't be in. I've been
rewarded enough times for having included remove() error checking that it
has stuck in my mind as a worthwhile practice. It seemed like the remove()
check was often the first sign of impending disaster. Since even on
today's processors these jobs take 12-24 hours, the operations people love
it when you can detect an error early rather than late.
> Precondition checking should usually not be done with exceptions in
>the first place, and I think this is a particularly bad use for it.
But as James Dennett points out, the error conditions the library throws
really aren't exactly precondition failures, they are failures the
operating system is reporting.
I guess you could view the file not existing, or being read-only, or your
process not having the right permissions, etc, as being precondition
failures, but because of the shared resource non-atomic operation issues, I
don't think that is the best approach.
>I used to work on a project that manipulated timestamped events. Events
>were always marked with positive times. The functions for erasing events
in
>a timerange accepted signed numbers denoting times, but would assert if
the
>start time was ever negative, since no events were allowed to have a
>negative time (see: "you're not allowed to delete a file that doesn't
>exist"). If the time range to be deleted was the result of a computation
(a
>very common case), you had to remember to pin the start time at zero or
you
>might get an assertion. It was a crapshoot whether a failure to do so
would
>turn up during testing. It usually would, though, and you could easily
fix
>the problem by inserting a stupid call to max(start, 0) in order to keep
>the assertion brasswork shiny. It was immensely frustrating to have to
use
>this boilerplate all over, but at least there was a formula to make
things
>work.
>
>This example is much worse, since the problem will almost never show up
>during testing,
That's contrary to a lot of real-world experience I've had. In fact the one
argument that resonates with me in favor of ignoring a return code of
"couldn't find the file you asked the OS to delete" is that otherwise
remove() generates a lot false positives, particularly during early
testing.
> and even if it did, there's absolutely no test you can make
>to check whether the file will actually exist by the time you try to
delete
>it.
The general solution is to go ahead and try the operation, and see if the
operating system reports success or not.
--Beman
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk