Boost logo

Boost :

Subject: [boost] [constrained_value] Constrained Value review results
From: Gordon Woodhull (gordon_at_[hidden])
Date: 2010-09-11 15:47:27


Greetings,

Substituting for Jeff Garland, I am pleased to present the results from the constrained_value review.

The library is available at http://rk.dl.pl/f/constrained_value.zip

The documentation is at http://rk.dl.pl/r/constrained_value

Enjoy!
Gordon

1. History

The idea for a constrained value library goes back to at least 2004; Christopher Diggins proposed a mini-library at that time.

In 2007 Joe Gottman suggested that Jeff Garland's constrained_value.hpp in date_time could be a top-level library.

Concurrently, Robert Kawulak (henceforth RK) was working on his own library for the purpose, which he also started in 2004, unaware of the others.

RK announced his library on November 11, 2007. Discussion continued into 2008; on June 19, RK announced his library was ready for review and it was put in the queue.

The review began on December 1 2008 and continued until around December 15 (with some trailing discussion of the definition of invariant and why exceptions should not be used to check invariants).

Gordon Woodhull reviewed the thread's 260 messages in the summer of 2010 and produced this report, dividing the issues raised between those where there was consensus on a change, those where no resolution was found, and those where the author and requesters (mostly) agreed that no change was needed.

2. Votes

Eight people voted for acceptance of the library; there were no "no" votes. However, many people stated conditions on acceptance.

Thanks to Chris (indy271828_at_[hidden]), Vicente Botet, Zach Laine, John Maddock, Thorsten Ottosen, Stjepan Rajko, Daryle Walker, and Gordon Woodhull for providing reviews. (In addition, Paul Bristow provided a review which looks like a yes but does not explicitly say so.)

Thanks also to David Abrahams, Kim Barrett, Neal Becker, Paul Bristow, Edward Diener, Jeff Flinn, Joe Gottman, Mathias Guanard, Mika Heiskanen, Giovanni Pierro Deretta, Thomas Klimpel, Michael Marcin, OvermindDL1, John Phillips, raindog, Ravi (lists_ravi_at_[hidden]), Johan Råde, Sebastian Redl, Peter Simons, Steven Watanabe, Utku Yaman, and Deane Yang for participating in a very active discussion.

3. Issues to be Addressed

Floating point

By far the most contentious issue in the review was the statement in the library documentation that floating point is not supported, and a few reviewers made fixing this a condition on acceptance.

The issue is that the x87 math instructions (which are the default for 32-bit Intel compiles) utilize 80-bit registers which are then rounded down to 64 bits when a double value is written to memory. This can cause two values which were not equal to become equal later, with fairly unpredictable behavior depending on optimization and what code intervenes between calculating a value and comparing it.

One solution is simply not to use the x87 math instructions by enabling the SSE2 math instructions instead. However, users may want the extra precision, and there is no way for a library to control this option.

A few reviewers confused this issue with the frequent need for fuzzy comparison with epsilon; it was shown that the bug can still appear with epsilon-based comparisons.

Jeff Garland points out that the Boost Numeric Interval library,
http://www.boost.org/doc/libs/1_42_0/libs/numeric/interval/doc/interval.htm
has also dealt with the x87 and NaN problems. It uses the round-trip-to-memory approach for x87. There is potential to create constraints for constrained_value using numeric::interval, and perhaps any other fixes for floating point belong in that library.

Many solutions were proposed; after debate there were four left standing:

* Force the new value to be written to memory

Kim Barrett suggested the following function, noting that it should not force a memory write unless the value is in a register.
inline double exact(double const& x) {
    return *const_cast<double const volatile*>(&x);
}

Gordon Woodhull has tested this against example provided by Thomas Klimpel in a message titled "floating point FUD" of January 23, 2010 (http://article.gmane.org/gmane.comp.lib.boost.devel/198817) and it does work.

However, many people objected to this solution because of efficiency concerns: it requires a round-trip to memory (or at least to cache) for every assignment.

* Only allow >= or <= for floating point types, and make sure the bounds are truncated

80- to 64-bit truncation can cause two values to become equal, but if one of them is already truncated, it cannot cause them to swap order. Since it is likely that the value will be changed much more frequently than the bound, it would suffice to apply truncation just to the bounds when they are set, e.g. with the exact() function above.

However it does not seem that this solution can work if the bounds are dynamically calculated; the library supports bounds being returned by an inlined function.

Johan Råde also suggested that it might be possible to implement < and > in terms of <= and >= and the std::nextafter function.

* Force the processor to use 64-bit precision

Johan Råde mentions a floating point control register which could be used in http://article.gmane.org/gmane.comp.lib.boost.devel/183760

There were also mentions of possible assembly code solutions, but none were provided and this was generally unpopular because it requires someone very knowledgeable to maintain it.

John Phillips argued that reducing the precision will "lose significance faster than expected".

* Another interesting solution was suggested by Stjepan Rajko: that the test might not be the same as the invariant. This is what testing with epsilon would guarantee. If the value passes the test it is guaranteed to satisfy the invariant, but only the test and not the invariant is in code. So some values might be rejected which are okay, but no bad value is accepted.

RK felt that any floating point solution should be implemented separately from constrained_value, because he does not have the expertise. But he should be willing to document the scope of the problem better, including the SSE2 compiler flags that avoid the problem altogether. The first solution (truncation of the value) does not seem to require any changes to the library. Truncating the bounds might require some extra customization points.

NaN

Another problem with floating point is NaN, which compares false with everything. It might be okay if the bounds inclusion test always returned false. But the closed range is implemented as !(x < lower) && !(upper < x). Stjepan Rajko suggested that it should always be compare(lower, x) && compare(x, upper) where compare is < for excluded bounds and <= for included. It seems like this solution would work well for floats but not necessarily for user-defined types, which might only define operator<().

Substitutability with the underlying type

A few people requested the ability to set a compiler flag in order to substitute the constrained_value with the underlying type. This is not exactly possible, unless the user is willing to explicitly extract the value using static_cast<T>(cv). The library does provide unconstrained<> which in most cases optimizes away, but the interface is not perfectly compatible with the underlying type.

Constraints not copied when operators used

The use of operators invokes implicit conversion to the underlying type, which means that in a statement like
        constrained_value x(10, constraint(...)), y(-x);
y ends up with uninitialized constraints. Steven Watanabe suggested that the operators could be overloaded, but RK argued that the behavior would be application-specific: "For some applications returning a constrained value would be OK (but again there is the problem that the constraint may reject the modified value) while for others it's better to return the underlying type."

Constrained strings

Jeff Garland offered that he has written constrained_string classes for allowing values from a set of strings, and for validation against a regex. RK said he might add a constraint type which validates against set<T>, and affirmed that a regex validator would be nice too.

Bounding on only one side

Utku Yaman and others requested a way to bound integers on only one side. RK noted that you could set one of the bounds to the maximum or minimum possible integer, but this is verbose. Default arguments would be confusing, and RK thought maybe he would supply lower_saturating_int and upper_saturating_int. Note that there is a more general problem, especially if floating point is allowed. (Also consider e.g. fixed point, rational.)

Recoverable error policy - error policy changing the value

There were some objections to the error policy taking three parameters - the new value, the member value, and the constraint. Does it really need to change the member value? Yes, RK explained, because this is a recoverable error policy. Vicente Botet asked that the concept be called RecoverableErrorPolicy.

Serialization

Neal Becker requested support for serialization. RK agreed this is worth looking into and pointed out a couple of difficulties. One, a way to not serialize empty (or static) policies is needed. Two, non-default constructible types such as odd_number might require use of load_construct_data() / save_construct_data() instead of serialize().

Rename bounded_int

This class name does not convey the fact that the bounds are specified at compile time. Neal Becker suggested compile_time_bounded or static_bounded. RK says the first name is too long, and the second might imply that the bounds are static members (which is also a legitimate use case).

sizeof static constrained types == sizeof underlying type

Assuming that the empty base class optimization is being applied correctly, which seems to be the case, if the constraint and the error policy are empty, the size of the constrained value should be the same as the size of the underlying type. However, this was not the case on all platforms, at least at the time of the review. The library should include sizeof tests in its test suite so that potential users know if there is unexpected overhead on their platform.

Specify static bounds as open<0>, closed<100> for bounded<> and bounded_int<>

Requested by Vicente Botet. If possible, this would reduce the number of template arguments and would be more clear than the current bounds exclusion boolean parameters.

Some default constructions may throw

If zero does not satisfy the constraint, then a default-constructed constrained value may throw an exception. (There are other examples, like non-empty string.) Many reviewers requested a means to provide a default value, which can be done by wrapping the underlying type, but RK felt a default value class was out of scope. At the very least this should be documented and probably an example given.

Examples

Vicente Botet requested:
* an example showing how to create a constrained class by derivation and/or containment which adds/removes methods from the constrained value interface
* a date example where changing the month affects the bounds for the day.
* example of static bounding on only one side
* example of bounds calculated at runtime by a function ("runtime dynamic bounding")
Others:
* example of constraints with static members - allows shared mutable constraints with no overhead per instance.

Tests!

Many reviewers were offended that there were no tests in the review version. RK said he would write some.

Documentation

* document the concepts and provide tests with concept archetypes. It turns out that these are documented briefly in the doxygen documentation of constrained<> at http://tinyurl.com/39v8o6r, but many reviewers were expecting a section in the main document.
* Stjepan Rajko pointed out that the constraint can't depend on mutable members of the underlying type, and wondered if there were other implicit requirements, such as the way copy construction or swapping works. He requested these be documented.
* Gordon Woodhull suggests a section on space efficiency and which configurations should allow the size to be optimized to the same size as the underlying type, and spelling out in each example what the cost in space is.
* Thorsten Ottosen requested benchmarks of non-checking unconstrained against just using the underlying type. RK offered that he had looked at the generated assembly on some platforms.

4. Unresolved debates

Error policy is allowed to change constraint

Vicente Botet objected to the "bounded object with memory" example, which tracks extremal values by setting the bounds to include any new values. And Thorsten Ottosen objected to the inefficiency of change_constraint using construct-and-swap: it's not necessary to swap the constraint unless it might be changed by the error policy. This seems to be an example of a monitored value, which RK does not want to write himself, so does it belong in this library? RK argued that it does no harm, so why not allow?

Double-checking and disabling assertions

A number of people were concerned with the inefficiency of first checking the constraint, and then also asserting that the constraint is satisfied after the assignment has happened or the error handler has been called. But RK argued that it was necessary to check the invariant to make sure that the assignment operator and the error handler are doing their job. Besides, it is possible to disable the asserts with either BOOST_DISABLE_ASSERTS or NDEBUG. There was some discussion of adding an assert policy or a specific macro (with no clear conclusion).

Disabling checks

There is unconstrained, which offers a way to replace one constrained type with an unconstrained type, but Chris (indy271828_at_[hidden]) requested a way to disable all checks with a single macro. It might not be good enough to use a single error policy and select between throwing version and an assign-anyway version, because the compiler won't always optimize this away.

More members, less free functions

Thorsten Ottosen thought that any functions which change the state of the object should be members, not free functions. In particular, change_constraint might be less efficient than possible because the construct-and-swap requires copying the constraint. This does seem to be partly an ideological debate.

Specific classes instead of a general super-template

Vicente Botet requested that the interface use specific classes instead of typedefs: currently many of the specializations, like bounded<>, are metafunctions that apply some parameters to the generic constrained_value and return a new type, and then the specific operations are free functions. Instead these could be full classes with member functions. RK argued that this would result in unnecessary duplication.

Way to disable the arithmetic operators

Vicente Botet requested a solution in code or in documentation for users who don't want the arithmetic operators defined.

5. Ideas not requiring any action

Monitored Values

Stjepan Rajko suggested (and Vicente Botet expanded on the idea) that constrained value is really a special case of a more general "monitored value" class, in which a monitor policy decides how to handle any assignment. So constrained value would be a monitored value where the monitor policy checks a constraint, possibly calls an error policy, and then asserts the constraint. RK did not want to implement this and resisted suggestions to allow the assertion to be disabled in order to allow this use case, because without an invariant, it's not really a constrained value. But he suggested that monitored value could be implemented by someone else and constrained value could be re-implemented in terms of it. "Monitored value" might not be the best name for the general class, because "monitor" suggests something passive, rather than something which controls the outcome of assignment.

A way to run a function on a constrained value

Stjepan Rajko requested a way to make a copy of the value, run a function on it, and then reassign back, preferring the syntax call_using_copy(f, x) to x = f(x). But he notes this could be written as a free function.

Multiple ranges

Edward Diener requested a way to supply multiple ranges, but agreed that the syntax would be unattractive.

Assertions instead of exceptions

Thorsten Ottosen requested a way to have only assertions and no exceptions, and RK pointed out that this can be done either by specifying an error policy which asserts, or even with an assign-anyway error policy, which leaves the invariant checking in place.

Exclude bounds parameter

Vicente Botet found it confusing that bounded_int takes false to mean include bounds and true to mean exclude the bounds. RK explained that this is an artifact of the way within_bounds is templated on a type which can be either a compile-time or a runtime value, e.g. mpl::true_ or bool; this technique means there can be no default argument to the constructor and thus the more common option must be false, the default constructed value of bool.

Constraints between several variables.

Vicente Botet suggested a sort of constrained_tuple where if you set one field it would affect the constraints for the other fields. E.g. a date object where the month and year would affect the valid range for the day. With the current interface you'd have to set the whole date at once.

Constrained values as sets of instances

Exploring the problem domain, Vicente Botet suggested another possible implementation of constrained value where the constrained value type keeps track of which instances satisfy the constraint. This class would offer a split operation which moves any instances which satisfy a more specific constraint into the new set. Changing the constraint would similarly return the set of instances which are now excluded.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk