Boost logo

Boost :

Subject: Re: [boost] a safe integer library
From: Robert Ramey (ramey_at_[hidden])
Date: 2015-12-12 13:36:14


On 12/12/15 9:11 AM, Phil Endecott wrote:
> Hi Robert,
>
> Robert Ramey wrote:
>> creating types which carry their ranges around with them through
>> expressions.
>
> I'm curious to know how well this works in practice, i.e. how much
> of the time do you need run-time checking in real programs.

That makes two of us.

> Here are a few examples that have occurred to me:
>
> 1. Arguments to functions
>
> void f(safe<int> i)
> {
> int8_t j = i; // Needs a run-time check unless we know i<128
> }
correct.

> f(3); // Run-time check happens, even though value is known.

The constructor of safe<int> doesn't do a runtime check when it's being
initialized by an int.

the assignment would require a runtime check.

> Even though the body of f is visible at the point of the call,

In general this is not true. f could well be compiled separately.

> I don't think an approach based on types can avoid the run-time
> check unless the function args are templates:
>
> template <typename RANGE_TYPE>
> void f(RANGE_TYPE i)
> {
> int8_t j = i;
> }
>
> f(3);

here is a simpler way

void f(safe<int> i){
     int j = i; // Needs a run-time check unless we know i<128
}

also

void f(safe<int> i){
     auto j = i; // Needs a run-time check unless we know i<128
}

also

void f(safe<int8_t> i){
     int8_t j = i;
}

but then f(3) requires a runtime check

UNLESS we do

constexpr void f(safe<int8_t> i){
     int8_t j = i;
}

in which case f(3) occurs at compile time - truely zero runtime cost !!!
The whole statement goes away!!!

> Here you can presumably arrange things so that the type of i inside
> f includes the range information and the run-time check can be
> avoided. But this does rely on the body of f being visible at the
> point of the call.

nope - it relies upon picking types which are large enough and carry
thier range information with them. The "automatic" promotion policy can
do this automatically.

The verbosity of the function declaration should
> improve a bit with concepts, but it's not really a drop-in replacement.

This is true - used as a drop in replacement, one will get more runtime
checking.

>
> Related to this is function return types:
>
> safe<int> incr(safe<int> i)
> {
> return i+1;
> }
>
> int8_t j = incr(42); // Run-time check or not?

No runtime check when the safe<int> is constructed on the stack

yes runtime check calculating the result of i + i.

No runtime check when returning the result.

Since C++ expression rules mean that the result of i + i will be an int
and there check required to construct a safe<int> from an int.

In fact, probably zero overhead due to NVRO.

yes runtime check on assigment to int8_t j = incr(42).

Unless incr is made constexpr at which time everything moves to compile
time. zero runtime checking - actually zero execution for the whole thing!

> Maybe auto return types help a bit but what about
>
> return (i%2==0) ? i/2 : 3*i+1;

what about it?
>
>
> 2. Merging constraints
>
> Example:
>
> void g(int j);
>
> void f(safe<int> i)
> {
> for (int a = 0; a < 1000000; ++a) {
> g(i+a);
> }
> }
>
> Does that generate a million run-time checks?
  yep (note: a will overflow on some machines)

> If my expectation
> is that i will be "small", most likely I would be happy with just
> one check at the start of f:
>
> assert(i < maxint - 1000000);

of course. But maybe you'd consider the following equivalent

void f(safe_integer_range<0, maxint> i){
     safe_integer_range<0, 1000000> a;
     for (a = 0; a < 1000000; ++a) {
       g(i+a);
     }
}

if maxint is less than 2^32 - 1000000 then no runtime checking will in
the function. Note how this differs from the usage of assert. With
assert we check during debug build. But when we ship we throw away the
seat belts. With safe types in this case, we get zero runtime overhead
and we're guaranteed not to crash at runtime.

At this point we're beyond the "drop-in" replacement stage. But we're
also enhancing getting safety and performance guarantees we've never
been able to get before.

> 3. Implied constraints:
>
> vector<S> v;
> f(v); // fills v
> safe<size_t> bytes = v.size() * sizeof(S);
>
> I know that the multiplication can't overflow, because if v.size()
> was large enough that it would overflow then I would have run out
> of memory already.

Hmmm - I don't actually know that for a fact. I've always found the
usage of size type confusing. I don't really know that it's not possible
to allocate a vector whose total space occupies more than 2^32 bytes (or
is it 2^64).

But in any case, since construction safe<size_t> with a variable of type
size_t will doesn't need to check there's nothing to anyway.

> So as I say, I'm curious to hear whether you've applied your safe<int>
> to any real applications and measured how many of the potential run-
> time checks are avoidable.

Only my tests and examples. I'm hoping someone will create some
performance tests.

> My fear is that it could be fewer than
> you hope, and that the checks remain inside loops rather than begin
> hoisted outside them.

We'll just have to see.

> Another reservation about this is the question of what should
> happen when an overflow is detected. Exceptions are not great,
> because you have to worry about what you'll do when you get one.

The library has the concept of exception policy. throwing an exception
is the default but you can select a different one or make your own. It's
not hard. But you really have to spend time thinking about what you
want to do.

One thing you might want to look at is the usage of automatic type
promotion (another policy - default is native). When using this policy,
types returned from binary expressions are almost always large enough to
hold the expression results - so we're back in to zero runtime checking.

Another thing you might look into is safe...range which does the
opposite. I attaches range information to your variables at compile time
so that runtime checking can be skipped when it's known not to be necessary.

Then there is the fact that all this is constexpr friendly which can
have a huge impact on performace.

> One ambitious idea is to find a way to fall back to arbitrary
> precision arithmetic:
>
> template <typename INT>
> void f(INT i) {
> INT i2 = i*i;
> INT j = (i2 + 4096) << 2;
> g(j%7);
> }
>
> try {
> f<int>(n);
> }
> catch (overflow&) {
> f<arbitrary_precision_int>(n);
> }
>
> How much syntactic sugar can we hide that in?

This is too ambitious for me. But you could do it with your own custom
promotion policy. The requirements for such a policy are in the
documentation and you can look how the "automatic" promotion policy does
it. It specifies the return types from binary operations dependent on
the types of the operands. It does it in such a way as to minimize the
need for runtime checking. If you make your own policy and named it
"phil" then using it would be:

template<typename T>
using safe_t = safe<T, phil>;

then just you safe_t<int> everywhere int is used. You'd have your own
drop-in replacement.

Note that I've made this sound significantly easier than it actually is.
Take a look at the "automatic" type promotion policy.

> I recently experimented with LLVM's thread safety checking (see
> http://clang.llvm.org/docs/ThreadSafetyAnalysis.html ); it lets
> you annotate variables to indicate that locks must be held when
> they are accessed:
>
> mutex m;
> int i GUARDED_BY(m);
>
> I've been impressed by that, and I wonder if attributes defining
> expected ranges and static analysis might be a more practical
> way to find potential overflows.

using safe_...range(min, max) seems equivalent to me.

I'm now trying to make work the piece d'resistance - an error trapping
policy which uses static_assert so that for things like embedded systems
one could get a compile time error anywhere something could possible
overflow - no runtime checking and no exception handling required. I'm
having difficulty with this as we speak.

My expectation is to see the library used in following manner

a) some safety critical program has a bug which is suspected to be
addressable by safe type

b) In desparation, someone just changes all the int, ... types via a
alias to safe equivalents.

c) program is compiled and run - and a bug is found. Hooray.

d) the alias is redefined to the built-in equivalents

d) but wait, some idiot consults a lawyer. He says - well now you know
how to guarantee that the program has no bugs (remember he's a lawyer)
so you better make sure you include this facility in case we get sued.
We have to be able to prove we did everything humanly possible to avoid
bugs.

e) someone complains that the program is going to be too slow. But under
pressure of the authorities they produce a benchmark and it IS slower.
But not quite by as much as was feared.

f) so now someone says - what would it take to make it faster? At this
point some changes are made so that we're not doing as much shrinking of
types. Perhaps the libraries "automatic" policy is used which expands
intermediate and auto variables to be big enough to hold the results.
perhaps safe_...range is used to specify real ranges. This eliminates
the performance bottleneck. But we're no longer using safe types as
"drop-in" replacements. We've created new types selected to (almost)
never trap.

It's just a speculative scenario. But it's not unbelievable.

I hope it can be seen that I've spent considerable time thinking about
this. Now I'm afraid to get on an air plane or get an MRI. It's that bad.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk