Boost logo

Boost :

Subject: Re: [boost] [math distributions] where to check for validity of distribution variables?
From: Thijs van den Berg (thijs_at_[hidden])
Date: 2008-11-24 07:31:33


Paul A. Bristow wrote:
>
>> -----Original Message-----
>> From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]]
>>
> On
>
>> Behalf Of Thijs van den Berg
>> Sent: 23 November 2008 20:11
>> To: boost_at_[hidden]
>> Subject: Re: [boost] [math distributions] where to check for validity of
>>
> distribution
>
>> variables?
>>
>> Paul A. Bristow wrote:
>>
>>>> -----Original Message-----
>>>> From: boost-bounces_at_[hidden]
>>>> [mailto:boost-bounces_at_[hidden]]
>>>>
>>>>
>>> On
>>>
>>>
>>>> Behalf Of Thijs van den Berg
>>>> Sent: 22 November 2008 14:48
>>>> To: boost_at_[hidden]
>>>> Subject: Re: [boost] [math distributions] where to check for validity
>>>> of
>>>>
>>>>
>>> distribution
>>>
>>>
>>>> variables?
>>>>
>>>> John Maddock wrote:
>>>>
>>>>
>>>>> Thijs van den Berg wrote:
>>>>>
>>>>>
>>>>>>>>>> What do you think? We might turn "having valid parameters"
>>>>>>>>>> into a property of *all* distribution. As an alternative, we
>>>>>>>>>> might add a non member function bool valid<distributionType...
>>>>>>>>>> but that wouldn't allow for caching validation in e.g. a
>>>>>>>>>> constructor
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> Sounds fine to me.
>>>>>>>>
>>>>>>>>
>>>>>>> thats great! What's your opinion on the fact that you can only set
>>>>>>> parameter in the constructor?
>>>>>>> E.g. the normal distribution does a parameter check in the
>>>>>>> constructor, and those parameters can't change after that.
>>>>>>>
>>>>>>>
>>>>> That's what the existing distributions do. In fact we could omit
>>>>> most of the subsequent parameter checking code if we could figure
>>>>> out whether the error handlers will throw or not on error (in fact
>>>>> we
>>>>> *can* get this information at compile time and make the subsequent
>>>>> checks a no-op if we know that the constructor would have thrown on
>>>>> error... we just ran out of time on that refinement).
>>>>>
>>>>>
>>>>>
>>>> I don't understand this, it has to do with my lack of knowledge on
>>>>
> this...
>
>>> If you ensure
>>>
>>>
>>>> that the parameters get checked in the constructor, why would that
>>>> check
>>>>
>>>>
>>> *not* throw
>>>
>>>
>>>> an error when needed?
>>>>
>>>>
>>> Often you just want to return a NaN, infinity or a 'best guess'.
>>>
>>> So John devised the rather complicated - but very useful - policies.
>>>
>>> Most important they are needed to provide the C++ Standard library
>>> C-style error behaviour.
>>>
>>> enum error_policy_type
>>> {
>>> throw_on_error = 0, // throw an exception.
>>> errno_on_error = 1, // set ::errno & return 0, NaN, infinity or best
>>>
> guess..
>
>>> ignore_error = 2, // return 0, NaN, infinity or best guess.
>>> user_error = 3 // call a user-defined error handler.
>>>
>>>
>>>
>> Hi Paul,
>> thanks for the info!
>> I'll have to delve into those concepts a bit more I see.
>> Regarding the checking in non member functions for the validity of the
>> distribution: would it be possible for the distributin contructor to fail
>>
> runtime (before the
>
>> distribution parameters can be validated)?
>> Would it be safe for me to assume that
>> * if a distribution validate its parameters in the constructor
>> * if the constructor doesn't throw an error then
>> * there is no need to check the distribution parameters anymore after
>>
> construction,
>
>> e.g. in a non-member function.
>> if so, then I would send in new distributions with only checks in the
>>
> constructor
>
>> another option is to check parameters (and throw errors) in the
>>
> distribution parameter
>
>> access member funtions like "RealType location() const {return
>>
> m_location;}". A
>
>> possible drawback in that is that sometimes a *combination* of parameters
>>
> is valid or
>
>> not.
>>
>
> As I recall, because the chosen policy for the constructor might not cause
> it to throw, we decided on 'belt and braces' repeated checks, even if it
> proved redundant (because the check is cheap). If there are other
> combinations that might cause trouble, this means this is even more
> sensible.
>
>
ah! you're saying you van have one type of policy for the distribution
(construtor) and another policy
type in some non-member function like pdf.
that explains the things I'm seeing!

A final question regarding the error checking is this:
Suppose a distribution has a couple of valid an invalid parameters. E.g.
normal(2,0), whith has a valid
mean=2 and invalid std=0. Formally that would make the distribution
object invalid... There are at least
two possible view on what to do with non-member fuctions.
1) Make *all* of them return NaN because the distribution in invalid.
This is a mathematical interpretation
or
2) (current implementation) try to give an answer when possible, this is
a "can we calculate the result?"
interpretation. In this case we can calculate the mean (it's 2), but we
can't calculate the pdf because
that would give an divide by zero.

I'm asking this because I'd like to stick to you're approach with new
code, and *not* because I want
to discuss a preference for any of the two... :)
>>>> Compile time might be tricky depending on the complicity of the
>>>> parameter
>>>>
>>>>
>>> validation
>>>
>>>
>>>> code, but simple range check on the parameters could be done compile
>>>>
> time.
>
>>> What
>>>
>>>
>>>> mechanism are your thinking about regarding compile checking, e.g.
>>>> that
>>>>
>>>>
>>> scale>0?
>>>
>>> The complexity of policy options make it much simpler to do a run-time
>>> check.
>>>
>>> You'd save a tiny bit on run-time - but probably pay in compile time?
>>>
>>> Paul
>>>
>>>
>>>
>> I think the same about that. runtime is good enough, and even unavoidable
>>
> if you
>
>> would allow distribution parameters to be set runtime. Btw ,why isn't that
>>
> implemented
>
>> (allowing distribution paramters to be set riu-time)? Lack of
>>
> implementation time
>
>> (postponed to future versions), of is it a design choice?
>>
>
> As I recall, construction (and destruction) is cheap (compared to a cdf, pdf
> etc) , it is simplest to make users construct a new distribution.
>
>
yes, I agree, that's a good way to implement the problem I described
with the current interface! Works fine!

> Paul
>
> ---
> Paul A. Bristow
> Prizet Farmhouse
> Kendal, UK LA8 8AB
> +44 1539 561830, mobile +44 7714330204
> pbristow_at_[hidden]
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

-- 
SITMO Quantitative Financial Consultancy - Software Development
M.A. (Thijs) van den Berg
Tel.+31 (0)6 2411 0061
Fax.+31 (0)15 285 1984
thijs_at_[hidden] <mailto:thijs_at_[hidden]> - www.sitmo.com 
<http://www.sitmo.com>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk