Boost logo

Boost :

From: Vesa Karvonen (vesa_karvonen_at_[hidden])
Date: 2002-04-02 13:44:13

From: Douglas Gregor <gregod_at_[hidden]>
>On Tuesday 26 March 2002 04:26 am, you wrote:
> > I also need to be able to build recursive variant types.
> > Most of the variant type proposals I have seen do not support this.
>... yet this may very well be the most important feature of variants,

Yes, recursive variants are common. Non-recursive variants are not uncommon

>and Ocaml is probably the gold standard for variant types.

Many (modern) FP languages have rather similar variant type mechanisms, and
context free grammars can also be rather nicely mapped to variant types and
vice versa, so I suppose the concept has been rather thoroughly tested.

> > Does someone have an idea on how to support recursive variant types?
>I think we can make the placeholders work, though perhaps under a
>different syntax that above, e.g.:
>typedef variant<int, float, string, rec1<aggr, variant_self> > value;

If I recall correctly, the above has the problem that it is not portable
when used with standard containers, because a standard container template
may have extra defaulted template parameters. This breaks the template
template parameter match.

> // rec2, rec3, etc. should be pretty obvious

Hopefully there is a way to avoid using names that include the number of
parameters. I don't know if an ideal solution exists within the current C++
core language. Member templates would probably mean more work for users.

>The non-optimality comes from the fact that some recursive types in a
>variant will be no larger than a variant, and therefore don't have to
>be stored on the heap. Shifting the example to something that more
>obviously has this trait, consider:
>type expr =
> | Int of int
> | Float of float
> | Negation of expr
>So the negation of an expression is just an expression - the type is no
>larger than, and should have the same alignment as, an expression; yet
>the approach will allocate it on the heap.

hmm... I think that you forgot the space required by the constructor tags:
Int, Float & Negation. Consider the following type:

  type zeros_and_ones =
    | End
    | Zero of zeros_and_ones
    | One of zeros_and_ones

As you can see, in the above more general case, it is impossible to not
allocate the recursive zeros_and_ones from the heap - otherwise we would be
able to store an infinite amount of information in finite space.

Ocaml and other functional programming languages typically represent variant
types using a tag-value pair. The tag is simply an integer that tells which
constructor is being used (Int,Float or Negation). The value, if any, is a
single slot that contains the value of the constructor (int, float or expr).
The heap allocation of recursive types is not such a problem in FP languages
due to GC. Heap allocation can be as cheap as pointer increment.

>I can see two approaches to solving this:
> 1) Perform a 'test instantiation' on recursive types, using a
>variant that assumes all recursive types need to be allocated on the
>heap. If the size and alignment of the object are <= the size and
>alignment of the non-recursive types in the variant, then this
>recursive type need not be allocated on the heap. Other recursive
>types will, of course, have to be allocated on the heap to avoid that
>pesky limitation that object sizes be finite :)

That is an interesting approach. Are you sure that it is always possible to
perform such a test instantiation? The test instantiation would have to be
made during the instantiation of the variant<> template. Anyway, it would
probably not lead to optimal variant types. Consider the following snippet:

  typedef variant< void*, rec2<pair,self*,self*> > heap;

Depending on the application, it might be beneficial to store the pair
in-place. What we really want to know is that will the recursive template
contain the variant or just a reference/pointer(s) to it (or perhaps not
even that).

It might be beneficial to make it possible for the user to explicitly force
in-place allocation:

  typedef variant< self*, inplace_rec2<pair,self*,self*> > inplace;

> 2) Use (and write) that small_string_optimization class template we
>talked about for Boost.Function a while back. The
>small_string_optimization class template might look like this:
> This handles recursive types automatically, and will also let
>Boost.Function and Boost.Any include a general version of the small
>string optimization without much hassle.

Something like this might be a useful part of the variant implementation.

I don't see how this could easily be used in Boost.Any. The boost::any
implementation relies on a pointer to an abstract base class. You would need
at least one extra bit to (portably) tell whether a pointer is being used or
not. Currently boost::any only contains a single pointer.

But let's turn to other issues.

It seems that most variant type proposals seem to silently omit the variant
type constructors (or tags) used in FP languages. Even the C++ union allows
multiple fields with the same type:

    union xyz
    { int x;
      int y;
      int z;

The above union contains either x, y or z. Now consider the following Ocaml
type definition:

    type expr =
      | Add_expr of expr * expr
      | Sub_expr of expr * expr
      | Mul_expr of expr * expr
      | Div_expr of expr * expr
      | Neg_expr of expr
      | Int_expr of int

           ^ ^
       tags/ctors value types

Note that in the above variant type, the same value type 'expr * expr' is
actually used for multiple tags or constructors. The tags are an important
feature of variant types in FP languages.

Perhaps we should also have explicit tags in C++. We could do something like

    struct add_expr {};
    struct sub_expr {};
    struct mul_expr {};
    struct div_expr {};
    struct neg_expr {};
    struct int_expr {};

      < add_expr, rec< tuple, self, self>
      , sub_expr, rec< tuple, self, self>
      , mul_expr, rec< tuple, self, self>
      , div_expr, rec< tuple, self, self>
      , neg_expr, self
      , int_expr, int
> expr;

Alternatively we could use integral constant expressions as tags:

    enum expr_tags
    { add_expr
    , sub_expr
    , mul_expr
    , div_expr
    , neg_expr
    , int_expr

At this point it is difficult to see any major advantages/disadvantages in
either approach, but using types seems safer.

We could also use explicit tag & value pairs:

      < ctor< add_expr, rec< tuple, self, self> >
      , ctor< sub_expr, rec< tuple, self, self> >
      , ctor< mul_expr, rec< tuple, self, self> >
      , ctor< div_expr, rec< tuple, self, self> >
      , ctor< neg_expr, self >
      , ctor< int_expr, int >
> expr;

This would also make it possible to specify tags that do not have a value.
Consider the following Ocaml type:

    type value =
      | Unit_value
      | Int_value of int
      | Float_value of float
      | String_value of string

As you can see, the Unit_value constructor does not take a value. The above
could be done in C++ like this:

    struct unit_value {};
    struct int_value {};
    struct float_value {};
    struct string_value {};

      < ctor<unit_value>
      , ctor<int_value, int>
      , ctor<float_value, float>
      , ctor<string_value, string>
> value;

Actually it is possible to make a variant type which does not have a single
constructor taking a value.

In FP languages, variants are constructed by using the constructors of the

    Mul_expr ( Int_expr 1, Int_expr 2 )

In C++ we would also specify both the tag and the value. Something like this


In FP languages, variant types are usually immutable. In other words, you
can not change the tag of a variant after it has been constructed. In C++,
we would probably want to be able to do just that. In such an assignment, we
would have to pass both the tag and the value:


Copy assignment would look like an ordinary C++ assignment:

    e = other_e;

Variants would have value semantics.

In FP languages, probably the most important operation on variant types is
pattern matching. Consider the following Ocaml function:

    let rec eval expr =
      match expr with
       | Add_expr (lhs,rhs) -> (eval lhs) + (eval rhs)
       | Sub_expr (lhs,rhs) -> (eval lhs) - (eval rhs)
       | Mul_expr (lhs,rhs) -> (eval lhs) * (eval rhs)
       | Div_expr (lhs,rhs) -> (eval lhs) / (eval rhs)
       | Neg_expr expr -> - (eval expr)
       | Int_expr value -> value

The above could be done in C++, using the lambda library, something like

  int eval(expr& e)
  { return
        ( with<add_expr>(
            bind(eval, bind( get_0, _1)) + bind(eval, bind(get_1, _1)) )
        | with<sub_expr>(
            bind(eval, bind( get_0, _1)) - bind(eval, bind(get_1, _1)) )
        | with<mul_expr>(
            bind(eval, bind( get_0, _1)) * bind(eval, bind(get_1, _1)) )
        | with<div_expr>(
            bind(eval, bind( get_0, _1)) / bind(eval, bind(get_1, _1)) )
        | with<neg_expr>(
            - bind(eval, _1) )
        | with<int_expr>(
            (0, _1) )

Serious type deduction is needed for the above to compile. It would probably
be a good idea to make it possible to explicitly specify the return type.

We would also want to be able to match any tag:

    bool is_int(value& v)
    { return
          ( with<int_value>( constant(true) ) // doesn't work currently
          | with<> ( constant(false) ) // nullary function

It would also be nice to be able to test for a specific constructor (this
does not invalidate the usefulness of being able to match any tag):

    if (<int_value>()) {...} else {...}

And to cast using a specific constructor:

    int i = v.cast<int_value>()

Are there any other operations that we should support or could we remove
support for some operations?

Get your FREE download of MSN Explorer at

Boost list run by bdawes at, gregod at, cpdaniel at, john at