Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2003-09-01 16:06:56


Brian McNamara <lorgon_at_[hidden]> writes:

> Template libraries, especially those employing expression templates,
> take a long time to compile. As an example, one of the example files
> for FC++ (parser.cpp) takes about 10 minutes to compile on a blazingly
> fast machine with tons of RAM.

Which compiler? Have you seen
http://users.rcn.com/abrahams/instantiation_speed/index.html?

> I would like to reduce the compile-time. I solicit any help/advice on
> the topic

Switching compilers may be your best bet.

> I am hoping some of the Boost contributors will have run into this
> same problem with their own libraries, and have found some ways to
> address it.

Generally speaking, the key is to reduce the number of template
instantiations, but it's also a good idea to eliminate unused
computations... avoid "traits blob" templates since all the nested
definitions need to be evaluated if you want to use just one; use
mpl::apply_if and logical operators and_/or_ to avoid needless
evaluations.

> Here is what I have already figured out.
>
> First off, in FC++, there are a number of templates whose sole
> purpose is to provide better compiler diagnostics (along the same
> general lines as concept_checks). I rewrote the library code so
> that these checks are only enabled when a certain preprocessor flag
> is defined. Turning off these checks reduced the compile-time of
> parser.cpp from 10 minutes to 8 minutes--a significant speedup.
>
> That was the most obvious piece of "low-hanging fruit"; since the
> code to produce the compile-time diagnostics doesn't do anything at
> run-time, it was straightforward to just have a switch to turn it on
> and off.
>
> I imagine there are other things I can do to rewrite some of the
> library templates that are doing "real work" so that they compile
> faster. Specifically, I imagine that some templates can be
> rewritten so that they cause fewer auxiliary templates to be
> instantiated each time the main template gets instantiated.

Good plan. On all but the most-recent EDG compilers the "nestedness"
of symbol names generated may have a significant impact on compile
times.

> However there are two issues that make this hard to do:
>
> (1) Knowing which templates to focus on. That is, which templates
> are effectively the "inner loops" in the compilation process, and
> thus deserve the most attention when it comes to optimizing
> them?

Heh. Welcome to the black box of C++ metaprogramming.

> (2) Knowing how to rewrite templates to make them faster. I imagine
> that "fewer templates instantiated" will mean "faster compile
> times", but I don't actually know this for sure. I have no
> window into what the compiler is actually doing, to know what
> takes so long. Maybe it's the template instantiation process;
> maybe it's all the inlining; maybe it's the code generation for
> lots of tiny functions. I don't know.

Yep, it's a nasty problem. I suggest some experimentation.

> I have made some headway with (1): the unix utility "nm" lists all the
> symbols compiled into an executable program, and by parsing the output,
> I am able to determine which templates have been instantiated with the
> most number of different types. My little script yields output like
> ...
> 313 boost::fcpp::lambda_impl::exp::Value
> 314 boost::fcpp::lambda_impl::BracketCallable
> 606 boost::fcpp::lambda_impl::exp::CONS
> 609 boost::fcpp::full1
> 610 boost::intrusive_ptr
> 670 boost::fcpp::lambda_impl::exp::Call
> which tells me that the "Call" template class has been instantiated 670
> different ways in parser.cpp. This at least gives me some idea of
> which classes to focus my optimizing attention on. However a drawback
> of using the "nm" approach is that it only shows templates with
> run-time storage. There are tons of template classes which contain
> nothing but typedefs, and I imagine they're being instantiated lots of
> ways too, and I don't know if this slows stuff down significantly too.

It does; see the link at the top of my reply.

Also, if you're using something called CONS you're probably also using
"hand-rolled" metaprograms. MPL contains some interesting techniques
designed to reduce the stress on compilers (e.g. compile-time
recursion unrolling, lazy evaluation); you might try using the
high-level interface of MPL to see if it improves things.

> As to (2), I know nothing, other than the speculation that "fewer
> instantiations is better".
>
> So, that's where I am. Help! :)

You're on the right track.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk