Boost logo

Boost :

Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2008-11-26 14:58:27


John Maddock wrote:
> Thijs van den Berg wrote:
>>>> That's what the existing distributions do. In fact we could omit
>>>> most of the subsequent parameter checking code if we could figure
>>>> out whether the error handlers will throw or not on error (in fact
>>>> we *can* get this information at compile time and make the
>>>> subsequent checks a no-op if we know that the constructor would
>>>> have thrown on error... we just ran out of time on that refinement).
>>>>
>>> I don't understand this, it has to do with my lack of knowledge on
>>> this... If you ensure that the parameters get checked in the
>>> constructor, why would that check *not* throw an error when needed?
>
> Correct, the constructor might not throw if the parameters are
> invalid, *and* the current policy for handling domain errors is
> something other than throwing an exception. Of course exception
> throwing is the default, and highly recomended, but there are some
> situations where exceptions aren't allowed, and returning a NaN when a
> function that uses the distribution is the correct thing to do. In
> fact custom error handlers can return a *user-defined error-value*
> which should be propagated back to the caller of the non-member
> functions if the parameters to the distribution are invalid.
>
> The reference for error handing policies is here:
> http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_toolkit/policy/pol_ref/error_handling_policies.html,
> but best to read the tutorial
> http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_toolkit/policy/pol_tutorial.html
> first as that gives an end user perspective.
>
>>> Compile time might be tricky depending on the complicity of the
>>> parameter validation code, but simple range check on the parameters
>>> could be done compile time. What mechanism are your thinking about
>>> regarding compile checking, e.g. that scale>0?
>
> Ah, I don't mean compile time checking of parameters, I mean:
>
> If the current policy in effect (which *is* known at compile time),
> mandates throwing on a domain error, then we know for sure that the
> constructor would have thrown if the parameters were invalid. In that
> case *only* we can omit checking the parameters again in the body of
> the non-member functions as we know they must be OK.
>
very clear, thanks!

>>>>>> I'll work out the parameter idea in the Laplace distribution
>>>>>> code...
>>>>
>>>> OK good.
>>>>
>>> John,
>>> I got a bit of Laplace code to share!
>
> Cool :-)
>
>>> I still need to test the numerical results, but I compiles without
>>> errors/warnings, and it throws errors when parameters are invalid.
>>>
>>> Do I need to put the code somewhere? I've attached it to this mail...
>
> Go ahead and commit to the sandbox version of Boost.Math, if you let
> me know when you think it's release ready (or not), and I'll know it's
> OK for that addition to be merged to the Trunk then.
>
Ok, I'll fix the code a bit more -especially numerical validation-. When
it has some quality I'll put it in the sandbox.
>>> I have 3 idea's in the code I'd like to discuss.
>>> * a public member function "check_parameters" in the distribution
>>> class
>
> Looks fine, but I would make it const so that it can be called on
> const-qualified distributions. If check_parameters needs to
> cache/change something, then that can always be declared mutable as a
> last resort.
>
Hahaha the compiler pointed that out! The non-member function get a
"const distribution" passed so it needs to const indeed!
I think a good place for a validation caching mechanism could be in the
constructor. The constructor can then set a private bool, and that can
be accessed ala "RealType is_valid() const {..};"

This would imply a small change of error reporting.
E.g. pdf(dist, x) will have two checks
1) is dist valid?
2) is x valid?

The first check can throw a domain_error telling that the distribution
has (some combination) of invalid parameters. The throwing can be done
in the pdf function or in the is_valid() member function.. If we put it
in the is_valid() member function, then it should maybe be called
check_validity(), to make it clearer that it's doing some action. That's
probably the best interface. check_validity() can either do a check on
all parameters,or lookup a the cached checking result that was stored
during the initial checking done in the constructor.

>>> * a public member function operator() that allows run-time changing
>>> of
>>> dist parameters. I know that's a big change... I myself could use
>>> something like this. E.g. I have some code that calibrates a
>>> stochastic
>>> model based on time series data & stores the estimated distribution
>>> parameters in a file. Another program will read the distribution
>>> parameters from that file, crate distributions objects, and do
>>> probability calculations with that. I can only do that when I can set
>>> the distribution parameters *runtime*.
>
> *If* we support changing the parameters then IMO it shouldn't be an
> operator(): that's reserved for function like objects, and that's not
> what we have here.
>
> The thing is, there are some distributions where the valid range of
> one parameter may depend upon the values of others, so I'm not so keen
> on setting one parameter at a time (although it could clearly be done
> in this case). So what's wrong with:
>
> mydist d(1, 2);
> // do something
> d = mydist(3, 4);
> // do something else
>
> Currently all the distributions are assignable and cheap to copy, is
> that likely to change? We could insist that all distributions are
> cheap to copy, by using the PIMPL technique and copy-on-write for
> distros with lots of data. Otherwise let's add a reset() member
> function to set all the parameters.
>
That was indeed a useless idea, don't know what I was thinking! :) I was
too much involved (in ohter code) with template parameters, and was
thinking mydist<3, 4>() instead of mydist(3, 4);
>>> * no more checking for distribution parameters in the non-member
>>> functions. Checking is only done when the distribution parameters get
>>> set or get changed. But as said before, I have no good grasp on the
>>> subtle issues with that. You said "if we could figure out whether the
>>> error handlers will throw or not", implying that there are
>>> complexities
>>> with this.
>
> Yep: see above.
>
>>> At the moment, I just have the code. It you think the code is ok,
>>> then
>>> how would I go about with documentation & testing? Do you have some
>>> structure in place for that? I've seen quite some code in the
>>> sandbox/math, ...concept etc...
>
> The best thing is to see the tests for the other distributions as
> examples. We try and obtain independent test data for all the
> distributions - even if it's of limited precision - to sanity check
> our implementations.
>
> In this case, since we're trivially calling std lib functions, there
> shouldn't be any need to generate high precision test data for
> accuracy testing, just make sure you test all the corner cases, and
> error handling.
>
Yes I will.
> For the docs, if you take something like the docs for the normal or
> exponential as a starting point that should get you going?
>
> Re the code:
>
> PDF: looks like the sign of the value passed to exp() is the wrong way
> around (could be wrong about that). Sign in CDF might be suspect too.
>
> CDF: 1-exp(x) should probably -expm1(x) for accuracy.
>
> Quantile: not sure about the formulae here, will look again when I
> have more time.
>
John, I need to validate the code before you can waste your time on it.
I'm currently collecting benchmark values & writing a test file. That
should get rid of all the bugs. Paul gave me some good help with going
that way (with the test)
> HTH, John.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
SITMO Quantitative Financial Consultancy - Software Development
M.A. (Thijs) van den Berg
Tel.+31 (0)6 2411 0061
Fax.+31 (0)15 285 1984
thijs_at_[hidden] <mailto:thijs_at_[hidden]> - www.sitmo.com 
<http://www.sitmo.com>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk