# Boost :

From: Douglas Gregor (gregod_at_[hidden])
Date: 2002-09-06 13:11:25

On Friday 06 September 2002 01:30 pm, Fernando Cacciola wrote:
> If the expression (x<y) is of type bool, then the static analysis of a
> program containing it can 'trace' either of two execution branches.
> If the expression (x<y) if of type tribool, then the static analysis of a
> program containing it can 'trace' both branches simultaneously. I
> understand that such an analysis is very useful, for instance, for
> optimizers (is this what you are actually doing?)
>
> But, is then tribool useful *only* for the static analysis of the program?

I think you've been snagged by the linguistic trap that hides near static
analysis discussions :) We're talking about two different programs here: (1)
the user program which uses only integers and booleans and (2) the static
analyzer program.

This is a user program:

1: int x, y;
2: // x and y get their values here
3: if (x < y)
4: foo();
5: else
6: bar();

No tribools, intervals, etc. in the program. When it executes the 'x < y', we
will get either true or false, because integer comparisons in the user
program only use 2-state logic.

Now, consider writing the static analyzer. Because the program's input changes
every time it is run, we have to approximate the values of program variables
(e.g., x, y, and the result of 'x < y') in a way that covers _all_ executions
of the code. One common way to represent the values that an integer variable
(in the user program) could take is via intervals. For instance, in some
simple user code like this:

1: int x;
2: if (some_condition)
3: x = 3;
4: else
5: x = 5;
6: // ...

When we get to line 6 when the user program is running (really running, not in
the analysis), the value of 'x' will be either 3 or 5. However, when we
analyze all paths, we end up following the code like this:

at line 1, x is somewhere in the interval [-Infinity, Infinity]
at line 2, we don't know what the condition does, so follow both 'if' and
'else' simultaneously
at line 3: x is in the interval [3:3]
at line 5: x is in the interval [5:5]
at line 6, x might be in the interval [3,3] (if some_condition) or it might be
in the interval [5,5] (if !some_condition), so we approximate by taking the
union of its possible values, so we know only that x is in the interval
[3,5].

The same example works for boolean user variables with the 'tribool' class:

1: bool b;
2: if (some_condition)
3: b = true;
4: else
5: b = false;
6: // ...

When the static analysis _symbolically_ runs the program, at line 6 it only
knows that b might be true, or it might be false; so it needs the value of b
to be the indeterminate state. Just like we approximate integer values by the
intervals that are bounded in, we approximate boolean values using this
indeterminate state.

Of course, as soon as we approximate, we lose some information. Back to
'if (x < y)' example: when the program actually runs (not symbolically!), x
and y have definite values so 'x < y' gives a boolean result; but when we
analyze the program (running it symbolically), the inaccuracies of the
interval bounding representation of 'x' and 'y' make it possible that we
don't know the exact relationship between 'x' and 'y' (remember: this
relationship could change because of different inputs to the program when it
actually runs), thus the result of comparison is any of 'true for every
program input', 'false for every program input', or 'can't determine if it is
true or false for all program inputs'.

I'm not actually working on optimizers at the moment (that's "future work"),
but in static checking (they often use similar types of analysis).

Doug