Boost logo

Boost :

Subject: Re: [boost] [phoenix] compile-time performance
From: Thomas Heller (thom.heller_at_[hidden])
Date: 2011-09-14 06:32:16


On Wednesday, September 14, 2011 10:15:54 AM Beren Minor wrote:
> Hi,
>
> I'm interested in this compile-time issue because I'm facing the same
> kind of issue in some of my own projects.
> I'm extensively using Boost so I don't really know if it comes from it or
> if it comes from my code (which similarly uses a lot of templates).
>
> Anyway, could you share a little bit over how you can find out what
> are the compile-time hitters? I've got some sources taking like 40s to
> build with gcc and it's getting really annoying. Are they some best
> practices when using template meta-programming? Some tricks to
> know about what pattern is slow and what's quicker for the compiler?
>
> For example -- as this is a phoenix thread -- could you share some
> examples of what took time in phoenix and how you fixed it?

Unfortunately i can't really give general advise of what to do. Here is what I
have done for parts of phoenix that brought down compile times on gcc and
clang, obviously this doesn't hold for MSVC.

1) Partially preprocessing headers that had preprocessing loops to emulate
variadic templates. This helped reducing the constant time needed for the
preprocessor to generate code. Compile were reduced significantly but only in
TUs that are not that big (i.e. if you have long expression that take ages to
compile it is not really noticable, example here is spirit, it rarely depends
on variadics and spend most of its time in instantiating templates).

2) Avoid mpl metafunctions like at, if_ etc. and use at_c, if_c. This reduces
the number of instantiated templates because those meta functions are usually
implemented in terms of their _c functions, for example:
template <typename Sequence, typename N>
struct at : at_c<Sequence, N::value> {};
Unfortunately this trick doesn't hold for fusion as it is the other way around
there, the _c functions are implemented in terms of their non-_c counterparts.
(FWIW, i think that here lies a potential optimization possibility for
fusion).

3) Avoid full specializations. According to the standard, a template that is
fully specialized needs to be instantiated.
I used this technique for the various customization points in phoenix, for
example too register rules and actions. The definition for a action is:

    struct default_actions
    {
        template <typename Rule, typename Dummy = void>
        struct when;
    };

When registering an action for a certain rule you can write:

    template <typename Dummy>
    struct default_actions::when<your_rule, Dummy>
    {
        // ...
    };

This avoids the instantiation of that specific template if the header
containing is included, but the expression triggering that template isn't
actually used, thus saving time.
However, i tried to apply that trick to the various fusion extension points
and it didn't work.
One possible explanation for this behaviour is that the fusion extension
points are very lightweight struct themself. That is they contain e nested
struct which is a template themself that can't be instantiated yet. Thus the
added complexity of the Dummy parameter lead to more compile time instead of
decreasing it cause of less instantiations.
The phoenix actions nested when struct is different as it is heavier to
instantiate (it is usally implemented to derive from phoenix::call, which
might be quite heavy on the compiler ... there might be another optimization
possibility)

4) Avoid SFINAE. It doesn't scale. Consider a function that is overloaded with
different sfinae things enabled. Upon a function call, the compiler puts all
of these overloaded functions that have teh same arity as the function call in
the "suitable function set", after that every SFINAE template needs to be
instantiated and it needs to be decided if the type expression is valid or
not. Prefer tag dispatching and boolean metafunctions to dispatch to the
correct function.

> I've tried Steven Watanabe template profiler but got a lot of trouble with
> the STL complaining about its code being modified (which is true as the
> template profiler adds some code to it).

I haven't really used it myself yet as i don't find it very helpful to extract
valuable information out of the output.
Additionally, the instantiation count alone isn't really meaningful, as can be
seen with the diverging compile times of gcc and msvc in the other posts. Also
the results of various optimization tries show that it isn't enough.

I hope the lines I wrote above are enough to continue the discussion. Please
correct me if I am wrong and i would be happy if others can contribute their
experience too.

Joel Falcou also suggested to compile a list of the usual TMP scenarios with
different use case patterns. Based on these small scale examples we could
analyze certain optimization or pesimization techniques more efficiently.
I think we just don't fully understand the impact of TMP to compilers yet.
I would like to get oppinions and insight from compiler vendors too, as they
are at the source of "evil".


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk