Encoding and decoding templates
Different kinds of template parameters or polymorphism with macros
Handling unused sequence elements
This document describes the internals of the TYPEOF macro implementation. It is related to so called “compliant” implementation – one that uses partial template specializations to encode and decode types, and is not to be confused with the other two implementations that currently exist (or will soon exist) under the umbrella of the proposed BOOST_TYPEOF macro – Peder Holt’s “vintage” implementation, that trades partial template specialization for function overloading and compile time constants, as well as recently invented by Igor Chesnokov MSVC-specific typeof trick.
The code in this document is provided for the explanation purpose only. While it does reflect the actual code pretty closely, it differs in a number of ways. First, the BOOST_TYPEOF prefix has been omitted from all the macros to make the code smaller. The namespaces have been omitted for the same reason. Second, the code fragments were entered by hand, and were not compiled, so I apologize in advance for any typos made. I hope these typos will not prevent the reader from understanding the material, but would be happy to correct them as they are found and reported.
It has to be stressed that the idea of breaking a type into multiple compile-time integers by using partial template specializations is not new, and belongs, to the best of my knowledge, to Steve Dewhurst, who described it in his famous CUJ article “A BIT-Wise Typeof Operator”. The idea of applying MPL to this problem belongs to David Abrahams, see http://thread.gmane.org/gmane.comp.lib.boost.devel/76208.
The main thing that distinguishes this implementation from others available is the ease of definition of new specializations for complicated templates. For example:
template<class T, int n, template<class, unsigned int> class Tpl>
class foo; /* a template with rather involved template id */
REGISTER_TEMPLATE(foo, (class)(int)(TEMPLATE((class)(unsigned int)))) /* now foo can be handled by TYPEOF */
The implementation of this REGISTER_TEMPLATE macro, as well as many other useful specializations (for functions, arrays, etc.), has become possible because of extensive usage of the Boost Preprocessor Library.
Let’s say we have an expression “expr”. The first step would be to pass it to a function template, thus utilizing the built-in type deduction capabilities:
template<class T>
unspecified foo(const T&);
foo(expr);
Inside foo() the type of the expression is known (T), so the return type can be constructed in such a way that its size depends on the type T. One of possible ways of doing this is to return a reference to a character array:
template<class T>
char(& foo(const T&) )[
integral-const-depends-on-T
];
sizeof( foo(expr) );
Now let’s assume that a type can be encoded into a sequence of integer numbers. We will later explore how to do this. Let’s just say for now that it can be done, and looks like following:
template<class T>
struct encode_type
{
typedef unspecified type; // sequence of integer numbers
};
Since sizeof(foo(expr)) is just one integer, we cannot handle the whole sequence. Let’s then return the Nth element of such sequence. Accordingly, we add a parameter to “foo”, and rename it into more descriptive “at”:
template<class T, class N>
char(& at(const T&, const N&) )[
mpl::at<encode_type<T>::type,
N>::type::value
];
We can now reconstruct the sequence like this:
mpl::vector<
mpl::int_<sizeof(at(expr, mpl::int_< 0 >()))>,
mpl::int_<sizeof(at(expr, mpl::int_< 1 >()))>,
mpl::int_<sizeof(at(expr, mpl::int_< 2 >()))>,
…
mpl::int_<sizeof(at(expr, mpl::int_< N >()))>
>
If we take a big enough N, we can hope that our type will
fit. We will also let alone for now the
issue of how unused elements are handled.
Assuming now that it’s possible to decode this into the original type,
we can write:
#define TYPEOF(expr)\
decode_type<mpl::vector<\
mpl::int_<sizeof(at(expr, mpl::int_<0>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<1>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<2>()))>,\
…\
mpl::int_<sizeof(at(expr, mpl::int_<N>()))>\
> >::type
Let’s understand where we are. We just implemented the simplified typeof facility assuming the following:
1. It’s possible to encode a type into a compile-time sequence of integer numbers;
2. It’s possible to decode it back;
3. It’s possible to gracefully handle the unused elements of the sequence.
Let’s now explore these three issues in more detail.
Let’s consider the following type:
const std::pair<int*, std::string>*
This type can be represented as a tree where each node is either a type or a template or a modifier of the original type:
+-- pointer -- int
pointer -- const -- std::pair --+
+-- std::string
Let’s assign unique integer identifiers like following:
pointer 1
const 2
std::pair 3
int 4
std::string 5
Now the above type can be encoded as:
1 2 3 1 4 5
Once identifiers are assigned, any type containing these items can be encoded, such as:
std::pair< |
3 5 1 2 5 |
const
std::string* const |
2 1 2 5 |
std::pair< |
3 3 4 4 3 5 5 |
Decoding is also simple. Let’s decode the following sequence: 1 3 4 1 2 5
decode(1 3 4 1 2 5)
The first item, 1, tells us that this is a pointer:
decode(3 4 1 2 5)*
3 is an std::pair, and this is a template with two parameters:
std::pair<decode-2(4 1 2 5)>*
4 is an integer:
std::pair<int, decode(125)>*
1 is a pointer:
std::pair<int, decode(25)*>*
2 is const:
std::pair<int, const decode(5)*>*
5 is std::string:
std::pair<int, const std::string*>*
We are done.
Having figured out how types can be encoded into a sequence of integers, and then decoded back, let’s now see how this all can be implemented.
The described type encoding can be implemented with partial template specialization. For now let’s ignore the issue of generating unique identifiers. Let’s assume we have a UNIQUE_ID() macro that does the job. Also, from the compile-time performance point of view, it makes sense to append the encoding to a given sequence (which we’ll denote by “V” since this is an mpl::vector):
template<class V, class T>
struct encode_type; //not implemented
We can encode a type, for instance an integer, with the following specialization:
template<class V>
struct encode_type<V, int> : mpl::push_back<
V,
mpl::int_<4>
>
{};
When decoding a type, we will accept an iterator into original sequence, extract the first identifier, use it to match partial template specialization, and forward the rest of the sequence to this specialization:
template<class Iter>
struct decode_type : decode_type_impl<
typename mpl::deref<Iter>::type,
typename mpl::next<Iter>::type
>
{};
template<class ID, class Iter>
struct decode_type_impl; //not implemented
The implementation will return the decoded type and the position into original sequence where the decoding stopped. Again, for integer, it will look like this:
template<class Iter>
struct decode_type_impl<mpl::int_<4>, Iter>
{
typedef int type;
typedef Iter iter;
};
Both specializations for the same type can be combined into a single macro:
#define REGISTER_TYPE_IMPL(Name, ID) \
template<class V> \
struct encode_type<V, Name> : mpl::push_back< \
V, \
mpl::int_<ID> \
> \
{}; \
template<class Iter> \
struct decode_type_impl<mpl::int_<ID>, Iter> \
{ \
typedef Name type; \
typedef Iter iter; \
};
#define REGISTER_TYPE(Name)\
REGISTER_TYPE_IMPL(Name, UNIQUE_ID())
REGISTER_TYPE(int)
REGISTER_TYPE(char)
REGISTER_TYPE(short)
REGISTER_TYPE(long)
...
Let’s consider std::pair class template. Its encoding will put its ID, 3, into the vector, and then forward to encoding of its first, and then second template parameter:
template<class V, class P0, class P1>
struct encode_type<V, std::pair<P0, P1> >
{
typedef typename mpl::push_back<
V,
mpl::int_<3>
>::type v0;
typedef typename encode_type<
v0,
P0
>::type v1;
typedef typename encode_type<
v1,
P1
>::type v2;
typedef v2 type;
};
Decoding will decode the parameters, and re-construct the pair:
template<class Iter>
struct decode_type_impl<mpl::int_<3>, Iter>
{
typedef decode_type<Iter> d0;
typedef decode_type<typename d0::iter> d1;
typedef std::pair<
typename d0::type,
typename d1::type
> type;
typedef typename d1::iter iter;
};
With a little bit of preprocessor magic, these two can be combined into a single macro that can be used like this:
REGISTER_TEMPLATE(std::pair, 2)
This is all there is to say about templates as long as they only have type parameters. Things get more interesting however once we get to consider integral and template template parameters.
Let’s say we have the following class template:
template<class T, unsigned int n> class x;
First, how do we describe such a template to the preprocessor? This can be done with a preprocessor sequence:
REGISTER_TEMPLATE(x, (class)(unsigned int))
(Note that this is the same REGISTER_TEMPLATE macro, only now the second macro parameter describes what template parameters are used, rather than just providing their number. The macro is overloaded using some preprocessor magic.)
We already discussed how a type template parameter is encoded. Simplifying things for the purpose of clarity, we can assume that an integral template parameter is just placed as is into the vector, although this is not exactly true because the range of integers that can be returned via sizeof(character-array) is limited. This forces us to use two vector elements in some cases.
The encoding now might look like this (assuming ID of 21):
template<class V, class P0, unsigned int P1>
struct encode_type<V, x<P0, P1> >
{
typedef typename mpl::push_back<
V,
mpl::int_<21>
>::type v0;
typedef typename encode_type<
v0,
P0
>::type v1;
typedef typename mpl::push_back<
v1,
mpl::int_<P1>
>::type v2;
typedef v2 type;
};
This really begins looking like polymorphism! But first we need objects.
Objects are combination of properties. When we are talking about the preprocessor, we can use sequences. Besides regular properties we need type information inside objects. This type information can later be used for dispatching:
#define TYPE_PARAM (TYPE_PARAM)
#define INTEGRAL_PARAM(Type)
(INTEGRAL_PARAM)(Type)
Let’s now define “virtual functions”:
#define TYPE_PARAM_TYPE(This) class
#define TYPE_PARAM_ENCODE(This, n)\
typedef
typename encode_type<v ## n, P ## n>::type\
BOOST_PP_CAT(v, BOOST_PP_INC(n))
#define INTEGRAL_PARAM_TYPE(This)
BOOST_PP_SEQ_ELEM(1, This)
#define INTEGRAL_PARAM_ENCODE(This, n)\
typedef typename mpl::push_back<v ## n, mpl::int_<P ## n>
>::type\
BOOST_PP_CAT(v, BOOST_PP_INC(n))
Now we need a virtual function:
#define VIRTUAL(Fname, This)\
BOOST_PP_SEQ_CAT((BOOST_PP_SEQ_HEAD(This))(_)(Fname))
As you can see, the head of the object (sequence) is used for dispatching.
Before we can finish conversion of our encode_type specialization, we need to transform
(class)(unsigned int)
into
(TYPE_PARAM)(INTEGRAL_PARAM(unsigned int))
Without going into too much detail, here is an example of transformation sequence:
unsigned int à PREFIX_unsigned int_SUFFIX à (unsigned)(int) à MACRO_unsigned_int à INTEGRAL_PARAM(unsigned int)
class à PREFIX_class_SUFFIX à (class) à MACRO_class à TYPE_PARAM
Assuming this transformation is done with the macro called TRANSFORM_PARAMS, we can define our encoding specialization like this:
#define REGISTER_TEMPLATE_PARAM_PAIR(z, n, elem) \
VIRTUAL(TYPE, elem)(elem) BOOST_PP_CAT(P, n)
#define REGISTER_TEMPLATE_ENCODE_PARAM(r, data, n, elem)\
VIRTUAL(ENCODE, elem)(elem, n)
#define REGISTER_TEMPLATE_IMPL(Name, ID, Params, Size)\
. . .
template<class V\
SEQ_ENUM_TRAILING(Params, REGISTER_TEMPLATE_PARAM_PAIR)\
>\
struct encode_type_impl<V, Name<BOOST_PP_ENUM_PARAMS(Size, P)> >\
{\
typedef typename mpl::push_back<V, mpl::int_<ID> >::type V0;\
BOOST_PP_SEQ_FOR_EACH_I(REGISTER_TEMPLATE_ENCODE_PARAM, ~, Params)\
typedef BOOST_PP_CAT(V, Size) type;\
};\
. . .
#define REGISTER_TEMPLATE(Name, Params)\
REGISTER_TEMPLATE_IMPL(\
Name,\
UNIQUE_ID,\
TRANSFORM_PARAMS(Params),\
BOOST_PP_SEQ_SIZE(Params))
(SEQ_ENUM_TRAILING is our own macro with, hopefully, obvious meaning)
It’s worth noting here that we also support the third template parameter type, template template parameters. With three different types, and half a dozen “virtual functions”, such polymorphic approach really pays off.
Let’s revisit our TYPEOF macro implementation. We left it in the following state:
template<class T, class N>
char(& at(const T&, const N&) )[
mpl::at<encode_type<T>::type, N>::type::value
];
#define TYPEOF(expr)\
decode_type<mpl::vector<\
mpl::int_<sizeof(at(expr, mpl::int_<0>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<1>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<2>()))>,\
…\
mpl::int_<sizeof(at(expr, mpl::int_<N>()))>\
> >::type
Considering a few things discussed in the previous section, we should now rewrite it like this:
template<class T, class N>
char(& at(const T&, const N&) )[
mpl::at<encode_type<mpl::vector0<>, T>::type, N>::type::value
];
#define TYPEOF(expr)\
decode_type<mpl::begin<mpl::vector<\
mpl::int_<sizeof(at(expr, mpl::int_<0>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<1>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<2>()))>,\
…\
mpl::int_<sizeof(at(expr, mpl::int_<N>()))>\
>::type>::type
We don’t want the function template at() to be instantiated for N greater than the size of the encoded vector for at least two reasons:
1. Unnecessary template instantiations have a negative effect on compile-time performance;
2. mpl::at<> will fail.
So, let’s start with determining the size of the encoded
vector:
template<class T>
char(& size(const T&) )[
mpl::size<encode_type<mpl::vector0<>, T>::type>::type::value
];
Now, for the N greater than the size of the encoded vector, we will simply substitute zero for N, thus reusing the instantiation of at() that returns the first element of the encoded sequence:
mpl::int_<sizeof(at(expr, mpl::int_<(i < sizeof(size(expr)) ? i : 0)>()))>
Let’s define the encoded vector size limit, and put everything tohether:
#ifndef BOOST_TYPEOF_LIMIT_SIZE
#
define BOOST_TYPEOF_LIMIT_SIZE 50
#endif
template<class T, class N>
char(& at(const T&, const N&) )[
mpl::at<encode_type<mpl::vector0<>, T>::type, N>::type::value
];
template<class T>
char(& size(const T&) )[
mpl::size<encode_type<mpl::vector0<>, T>::type>::type::value
];
#define TYPEOF(expr)\
decode_type<mpl::begin<mpl::vector<\
mpl::int_<sizeof(at(expr, mpl::int_<(\
0 < sizeof(size(expr)) ? 0 : 0\
)>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<(\
1 < sizeof(size(expr)) ? 1 : 0\
)>()))>,\
mpl::int_<sizeof(at(expr, mpl::int_<(\
2 < sizeof(size(expr)) ? 2 : 0\
)>()))>,\
. . .\
mpl::int_<sizeof(at(expr, mpl::int_<(\
BOOST_TYPEOF_LIMIT_SIZE < sizeof(size(expr)) ?\
BOOST_TYPEOF_LIMIT_SIZE : 0\
)>()))>,\
>::type>::type
It’s now trivial for anybody familiar with the Boost Preprocessor Library to re-write this nicely, so let’s omit this. You can always see the result at boost/typeof/compliant/typeof_impl.hpp.
Looking at the resulting TYPEOF macro, it may seem that the type of our expression is encoded many times, since functions size() and at() are mentioned BOOST_TYPEOF_LIMIT_SIZE times each. However, the template encode_type<mpl::vector0<>, T> is always same for the same expression, so it is instantiated only once, and then just looked up. Hence, we can roughly state that the compile-time complexity of our TYPEOF is O(m), where m is the size of the encoded vector. In practice this means that TYPEOF compiles slowly for more complicated types than for simple types.
Copyright © Arkadiy Vertleyb, 2005