Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2008-11-26 14:58:27

Next message: Roman Yakovenko: "Re: [boost] [regex] Matching space - help needed"
Previous message: Daniel James: "Re: [boost] [unordered] Buffered functions?"
In reply to: John Maddock: "Re: [boost] [math distributions] where to check for validity of distribution variables?"
Next in thread: John Maddock: "Re: [boost] [math distributions] where to check for validity of distribution variables?"
Reply: John Maddock: "Re: [boost] [math distributions] where to check for validity of distribution variables?"

John Maddock wrote:
> Thijs van den Berg wrote:
>>>> That's what the existing distributions do. In fact we could omit
>>>> most of the subsequent parameter checking code if we could figure
>>>> out whether the error handlers will throw or not on error (in fact
>>>> we *can* get this information at compile time and make the
>>>> subsequent checks a no-op if we know that the constructor would
>>>> have thrown on error... we just ran out of time on that refinement).
>>>>
>>> I don't understand this, it has to do with my lack of knowledge on
>>> this... If you ensure that the parameters get checked in the
>>> constructor, why would that check *not* throw an error when needed?
>
> Correct, the constructor might not throw if the parameters are
> invalid, *and* the current policy for handling domain errors is
> something other than throwing an exception. Of course exception
> throwing is the default, and highly recomended, but there are some
> situations where exceptions aren't allowed, and returning a NaN when a
> function that uses the distribution is the correct thing to do. In
> fact custom error handlers can return a *user-defined error-value*
> which should be propagated back to the caller of the non-member
> functions if the parameters to the distribution are invalid.
>
> The reference for error handing policies is here:
> http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_toolkit/policy/pol_ref/error_handling_policies.html,
> but best to read the tutorial
> http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_toolkit/policy/pol_tutorial.html
> first as that gives an end user perspective.
>
>>> Compile time might be tricky depending on the complicity of the
>>> parameter validation code, but simple range check on the parameters
>>> could be done compile time. What mechanism are your thinking about
>>> regarding compile checking, e.g. that scale>0?
>
> Ah, I don't mean compile time checking of parameters, I mean:
>
> If the current policy in effect (which *is* known at compile time),
> mandates throwing on a domain error, then we know for sure that the
> constructor would have thrown if the parameters were invalid. In that
> case *only* we can omit checking the parameters again in the body of
> the non-member functions as we know they must be OK.
>
very clear, thanks!

>>>>>> I'll work out the parameter idea in the Laplace distribution
>>>>>> code...
>>>>
>>>> OK good.
>>>>
>>> John,
>>> I got a bit of Laplace code to share!
>
> Cool :-)
>
>>> I still need to test the numerical results, but I compiles without
>>> errors/warnings, and it throws errors when parameters are invalid.
>>>
>>> Do I need to put the code somewhere? I've attached it to this mail...
>
> Go ahead and commit to the sandbox version of Boost.Math, if you let
> me know when you think it's release ready (or not), and I'll know it's
> OK for that addition to be merged to the Trunk then.
>
Ok, I'll fix the code a bit more -especially numerical validation-. When
it has some quality I'll put it in the sandbox.
>>> I have 3 idea's in the code I'd like to discuss.
>>> * a public member function "check_parameters" in the distribution
>>> class
>
> Looks fine, but I would make it const so that it can be called on
> const-qualified distributions. If check_parameters needs to
> cache/change something, then that can always be declared mutable as a
> last resort.
>
Hahaha the compiler pointed that out! The non-member function get a
"const distribution" passed so it needs to const indeed!
I think a good place for a validation caching mechanism could be in the
constructor. The constructor can then set a private bool, and that can
be accessed ala "RealType is_valid() const {..};"

This would imply a small change of error reporting.
E.g. pdf(dist, x) will have two checks
1) is dist valid?
2) is x valid?

The first check can throw a domain_error telling that the distribution
has (some combination) of invalid parameters. The throwing can be done
in the pdf function or in the is_valid() member function.. If we put it
in the is_valid() member function, then it should maybe be called
check_validity(), to make it clearer that it's doing some action. That's
probably the best interface. check_validity() can either do a check on
all parameters,or lookup a the cached checking result that was stored
during the initial checking done in the constructor.

>>> * a public member function operator() that allows run-time changing
>>> of
>>> dist parameters. I know that's a big change... I myself could use
>>> something like this. E.g. I have some code that calibrates a
>>> stochastic
>>> model based on time series data & stores the estimated distribution
>>> parameters in a file. Another program will read the distribution
>>> parameters from that file, crate distributions objects, and do
>>> probability calculations with that. I can only do that when I can set
>>> the distribution parameters *runtime*.
>
> *If* we support changing the parameters then IMO it shouldn't be an
> operator(): that's reserved for function like objects, and that's not
> what we have here.
>
> The thing is, there are some distributions where the valid range of
> one parameter may depend upon the values of others, so I'm not so keen
> on setting one parameter at a time (although it could clearly be done
> in this case). So what's wrong with:
>
> mydist d(1, 2);
> // do something
> d = mydist(3, 4);
> // do something else
>
> Currently all the distributions are assignable and cheap to copy, is
> that likely to change? We could insist that all distributions are
> cheap to copy, by using the PIMPL technique and copy-on-write for
> distros with lots of data. Otherwise let's add a reset() member
> function to set all the parameters.
>
That was indeed a useless idea, don't know what I was thinking! :) I was
too much involved (in ohter code) with template parameters, and was
thinking mydist<3, 4>() instead of mydist(3, 4);
>>> * no more checking for distribution parameters in the non-member
>>> functions. Checking is only done when the distribution parameters get
>>> set or get changed. But as said before, I have no good grasp on the
>>> subtle issues with that. You said "if we could figure out whether the
>>> error handlers will throw or not", implying that there are
>>> complexities
>>> with this.
>
> Yep: see above.
>
>>> At the moment, I just have the code. It you think the code is ok,
>>> then
>>> how would I go about with documentation & testing? Do you have some
>>> structure in place for that? I've seen quite some code in the
>>> sandbox/math, ...concept etc...
>
> The best thing is to see the tests for the other distributions as
> examples. We try and obtain independent test data for all the
> distributions - even if it's of limited precision - to sanity check
> our implementations.
>
> In this case, since we're trivially calling std lib functions, there
> shouldn't be any need to generate high precision test data for
> accuracy testing, just make sure you test all the corner cases, and
> error handling.
>
Yes I will.
> For the docs, if you take something like the docs for the normal or
> exponential as a starting point that should get you going?
>
> Re the code:
>
> PDF: looks like the sign of the value passed to exp() is the wrong way
> around (could be wrong about that). Sign in CDF might be suspect too.
>
> CDF: 1-exp(x) should probably -expm1(x) for accuracy.
>
> Quantile: not sure about the formulae here, will look again when I
> have more time.
>
John, I need to validate the code before you can waste your time on it.
I'm currently collecting benchmark values & writing a test file. That
should get rid of all the bugs. Paul gave me some good help with going
that way (with the test)
> HTH, John.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
SITMO Quantitative Financial Consultancy - Software Development
M.A. (Thijs) van den Berg
Tel.+31 (0)6 2411 0061
Fax.+31 (0)15 285 1984
thijs_at_[hidden] <mailto:thijs_at_[hidden]> - www.sitmo.com 
<http://www.sitmo.com>

Next message: Roman Yakovenko: "Re: [boost] [regex] Matching space - help needed"
Previous message: Daniel James: "Re: [boost] [unordered] Buffered functions?"
In reply to: John Maddock: "Re: [boost] [math distributions] where to check for validity of distribution variables?"
Next in thread: John Maddock: "Re: [boost] [math distributions] where to check for validity of distribution variables?"
Reply: John Maddock: "Re: [boost] [math distributions] where to check for validity of distribution variables?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk