Boost logo

Boost :

From: Brian McNamara (lorgon_at_[hidden])
Date: 2003-09-01 14:03:09


Template libraries, especially those employing expression templates,
take a long time to compile. As an example, one of the example files
for FC++ (parser.cpp) takes about 10 minutes to compile on a blazingly
fast machine with tons of RAM.

I would like to reduce the compile-time. I solicit any help/advice on
the topic; I am hoping some of the Boost contributors will have run into
this same problem with their own libraries, and have found some ways to
address it.

Here is what I have already figured out.

First off, in FC++, there are a number of templates whose sole purpose
is to provide better compiler diagnostics (along the same general lines
as concept_checks). I rewrote the library code so that these checks
are only enabled when a certain preprocessor flag is defined. Turning
off these checks reduced the compile-time of parser.cpp from 10 minutes
to 8 minutes--a significant speedup.

That was the most obvious piece of "low-hanging fruit"; since the code
to produce the compile-time diagnostics doesn't do anything at run-time,
it was straightforward to just have a switch to turn it on and off.

I imagine there are other things I can do to rewrite some of the
library templates that are doing "real work" so that they compile
faster. Specifically, I imagine that some templates can be rewritten so
that they cause fewer auxiliary templates to be instantiated each time
the main template gets instantiated.

However there are two issues that make this hard to do:

  (1) Knowing which templates to focus on. That is, which templates
      are effectively the "inner loops" in the compilation process, and
      thus deserve the most attention when it comes to optimizing
      them?

  (2) Knowing how to rewrite templates to make them faster. I imagine
      that "fewer templates instantiated" will mean "faster compile
      times", but I don't actually know this for sure. I have no
      window into what the compiler is actually doing, to know what
      takes so long. Maybe it's the template instantiation process;
      maybe it's all the inlining; maybe it's the code generation for
      lots of tiny functions. I don't know.

I have made some headway with (1): the unix utility "nm" lists all the
symbols compiled into an executable program, and by parsing the output,
I am able to determine which templates have been instantiated with the
most number of different types. My little script yields output like
    ...
    313 boost::fcpp::lambda_impl::exp::Value
    314 boost::fcpp::lambda_impl::BracketCallable
    606 boost::fcpp::lambda_impl::exp::CONS
    609 boost::fcpp::full1
    610 boost::intrusive_ptr
    670 boost::fcpp::lambda_impl::exp::Call
which tells me that the "Call" template class has been instantiated 670
different ways in parser.cpp. This at least gives me some idea of
which classes to focus my optimizing attention on. However a drawback
of using the "nm" approach is that it only shows templates with
run-time storage. There are tons of template classes which contain
nothing but typedefs, and I imagine they're being instantiated lots of
ways too, and I don't know if this slows stuff down significantly too.

As to (2), I know nothing, other than the speculation that "fewer
instantiations is better".

So, that's where I am. Help! :)

-- 
-Brian McNamara (lorgon_at_[hidden])

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk