Boost logo

Boost :

From: Douglas Gregor (gregod_at_[hidden])
Date: 2002-06-11 22:40:51


On Tuesday 11 June 2002 11:25 am, Itay Maman wrote:
> It's nice to know I am not the only one who thinks a
> type-list based union is a good idea :)

Actually, I'm not convinced that type lists are the best choice for
implementing a discriminated union type. I'll explain this after I've given
some more interface opinions.

> > Here's the current requirements list from the Wiki
>
> page:
> > * Interface should support construction and
>
> assignment from any of the
>
> > types that the variant can hold.
>
> What about automatic casting from non-specified types?
> (For example: initializing a union of <float, string>
> with a double value). Personally, I think this
> behavioral detail should be specified by a policy.

I agree that we need implicit conversions to types stored in the union (e.g.,
your variant<float,string> implicitly constructed from a double example), but
I don't agree that this should be a policy. The reason will be discussed
below where it comes up again.

> In general, I see four separate, policy-controlled,
> aspects:

I try to avoid policy-based designs when possible, because they introduce a
large amount of variance in the semantics of a single component. I'll discuss
each of the four potential policies below:

> 1) Storage: (i) boost::any (ii) on-stack

When does one prefer the boost::any storage method instead of the on-stack
method? The on-stack approach is a lightweight approach that closely
resembles C-style unions; in the cases where variant objects are too large to
put on the stack, the variant objects can easily be allocated on the heap
(just like we would do with a C-style union). The only cases where on-stack
allocation isn't feasible occur with recursive or incomplete types (discussed
later...). So, I would say that we should only use on-stack allocation but
allow an 'escape' to heap allocation for recursion and incomplete types.

> 2) Automatic casting: (i) never, (ii) cast to the
> first type possible, (iii) cast if only one possible
> type found

How about:
  (iv) use overload resolution to choose the best type, or fail if no best
type exists

Again, I don't see a need for a policy here. Overload resolution is a C++
feature that every C++ programmer is aware of and it seems the natural
solution for this component. Here are the reasons why I reject the other
three:

  (i) implicit conversion is a feature of C++. Suppressing implicit
conversions should be performed by user types, not by the variant, because
the set of implicit conversions for a type is a property of the type itself,
not a container of that type.
  (ii) this is very unnatural. You could end up with something like:
variant<int, float> v;
v = 3.14159; // v stores the integer 3
  (iii) this is reasonable, but again: overload resolution is already
understood as a mechanism for finding the best alternative, so why not reuse
it?

Side note: the desire for overload resolution is why I am unsure of a purely
typelist-based approach. Getting all of the right overload candidates there
given just a typelist is horrific.
  
> 3) Assignmenmt error: (i) Compile-time error, (ii)
> initialize with default value of first type

I don't understand why (ii) would ever be desirable. We should be checking for
errors as early as possible.

> The fourth aspect relates to the issue of variant to
> variant assigmnet:
> Variant<List1> u1;
> Variant<List2> u2;
> ...
> u1 = u2; //Problem(?): u2's held type may be
> invalid
> // for assignment into u1
>
>
> 4) Variant to variant assigmnet error: (i)
> Compile-time error if List2 is not a sub set of List1.
> (ii) Run-time error (iii) assign the default value of
> the first type on List1

How about:
  (iv) compile-time error if for any of the types of the source variant there
is no 'best candidate' type in the target variant. So something like this
should be fine:

  variant<int, float, double> v1;
  variant<long, double> v2;
  v1 = v2; // okay
  
Again, I can't see a reason for using (ii) or (iii) because we should be
checking for errors as soon as possible. The behavior of (iii) could drive
one positively mad during a late-night debugging session. (i) is okay, but
overly strict if we want implicit conversions (I do).

>
> [snip]
>
> > * Recursive types should be allowed, i.e., a
>
> variant can hold a value of a
>
> > type that is constructed using the variant type
> > * Incomplete types should be allowed.
>
> Can you clarify on these two points?

I'll try to clarify by example. The canonical example for the use of recursive
types in variants is an expression tree. In an expression tree, each
expression node can be, for instance, a literal value (e.g., '2'), a variable
(e.g., 'x'), or an operation on other expressions (e.g., 'expression +
expression'). The last of these requires the expression type to be recursive,
because the types it can store depend on the expression type itself. If we
were using C-style unions, we might represent an expression like this:

struct expr {
  enum { et_literal, et_variable, et_plus, et_negate } type;
  union {
    int literal;
    variable_t* variable;
    std::pair<expr*, expr*> plus;
    expr* negate;
  } value;
};

We'd like to do the same with our generic discriminated union type. I
personally aspire to have a variant type that is similar in expressive power
to that of ML, where the above might be written as:

type expr =
  | Int of int
  | Var of variable
  | Plus of expr * expr
  | Negate of expr
  
>From the implementation point of view, recursive types aren't hard to deal
with: just don't try to allocate them on the stack. The complexity is stating
the recursion when using the variant type. My personal preference is to use a
pseudo-keyword 'rec', e.g.,

  typedef variant<int, variable_t*, Plus<rec, rec>, Negate<rec> > expr;

As for incomplete types, refer back to the C-style union version of 'expr'.
The variable_t type doesn't need to be defined because we're only holding a
pointer to it. If we could specify that a particular type is incomplete
(therefore requesting heap allocation), then we could easily use incomplete
types with the variant. It might look like this:

  typedef variant<int, incomplete<variable_t>, Plus<rec, rec>, Negate<rec> >
expr;

> > - The visitor is fine, but there are simpler
>
> syntactic constructs that could
>
> > be used. For instance, instead of 'visit_at' just
>
> use the function call
>
> > operator.
>
> The function call operator cannot be static, which is
> somewhat restrictive.

How so? Instead of a static function call operator, just default-construct an
empty class.

> > Also, the 'Visitor' class can be an implementation
>
> detail with a
>
> > (more natural?) syntax like:
> > Super_union<...> u0;
> > // ...
> > u0.visit(size_visit());
>
> This syntax is indeed more natural, when the visit
> policy is stateless. However, if we make go() to
> return *this, we end up with a very clean way for
> applying the operation to multiple objects:
>
> typedef Super_union<The_list> Concrete_union;
> Concrete_union::Visitor<size_visit>::type
> size_visitor;
>
> // ...
>
> int result =
> size_visitor.go(u0).go(u1).go(u2).total_;

Have you needed this type of behavior before? I can't come up with any
scenarios where it would help. Does it make the code shorter or more obvious?
Just the calls to 'go' don't tell the whole story, because there is a
significant amount of overhead (in C++ code, not at run-time) with this style
of visitor: the visitor's 'visit_at' routine must be templated over the
variant type and the 'Visitor' type generator must be used. Does the benefit
of easy application to multiple variant objects outweigh the more complicated
interface? If yes, there is a second part: does this benefit also outweigh
the cost of create a new type of interface (visit_at and the Visitor type
generator) instead of an interface based on a well-understood concept (a
function object)?

Following is a partial sketch of the interface I have in mind. I've omitted
details that we haven't really discussed yet or that I haven't thought about
long enough to have a strong opinion on (tagged types, recursive types,
stating type incompleteness, etc.).

        Doug

template<typename T1, typename T2 = unused2, ..., typename TN = unusedN>
class variant
{
public:
  variant(); // default-construct value of type T1

  variant(const T1& t1); // assign value t1
  variant(const T2& t2); // assign value t2
  // ...
  variant(const TN& tN); // assign value tN

  ~variant();

  void swap(variant& other); // swap values

  variant& operator=(const T1& t1); // assign value t1
  variant& operator=(const T1& t1); // assign value t1
  // ...
  variant& operator=(const TN& tN); // assign value tN

  variant& operator=(const variant& other); // assign value from other
  
  // semantics described above
  template<typename OT1, typename OT2, ..., typename OTN>
  variant& operator=(const variant<OT1, OT2, ..., OTN>& other);

  const std::type_info& type() const; // typeid(type of current value)
  bool empty() const { return false; } // boost::any compatibility
};

struct bad_variant_cast : public std::runtime_error { /* ... */ };

// cast to type T if current value is of type T; otherwise, throw
// bad_variant_cast
template<typename T, typename T1, typename T2, ..., typename TN>
  T& variant_cast(variant<T1, T2, ..., TN>& v);

template<typename T, typename T1, typename T2, ..., typename TN>
  const T& variant_cast(const variant<T1, T2, ..., TN>& v);


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk