Boost logo

Boost :

From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2007-08-11 14:48:41

Lassi Tuura wrote:

> Maybe I can help with that. For simple toys and well-contained
> programs, the compiler can do a lot to inline, as was kindly shown by
> someone. But take an application of ours as an example from somewhat
> more real software world.

Pure meta-programming has nothing to do with code. It only works with
types. There is no code at all.
There is also some other kind of meta-programming -- maybe a better name
exists -- but for which all code should be inlined.

It seems your concern is more about code generation induced by usage of
template functions, or member functions of a class template.

> An average run pulls in several hundred shared libraries and has a
> memory foot print of about 500 MB. Of the 500 MB, ~100 MB is for
> text segments (machine code). Of the 100 MB, several tens of
> megabytes is redundant duplicate symbols, for example template
> instantiations and out-of-line versions of inline functions. There's
> 1 MB of code for (numerous copies of) a single template function.
> For a more real measure of redundancy, you'd have to add on top code
> that was actually inlined.

I can only identifiate two real issues here:
- Functions which were only used internally (not exported by the shared
library) and always inlined didn't have their definition removed from
the library.
- Multiple occurences of the same function definition (which happen in
the way most C++ implementations work with templates, but that should
then be elided by the linker) still exist.

If any of those two things are true, then there may be an issue with
your compiler or linker.

> Put another way, we estimate 10-25% of the entire program memory
> footprint is various degrees of _code_ redundancy.

I don't consider inlining functions to be code redundancy, but rather
code specialization.

> And surprise, some of the bigger bottlenecks plaguing our programs
> are L1 instruction cache pressure and ITLB misses. 40% of L2 cache
> accesses are for instructions -- and each and every L1 instruction
> cache miss stalls the CPU. Solving the code bloat is obviously
> making its way up on our performance optimisation priorities.
> Haphazard use of templates and inlining definitely make things
> worse. Hope this helps understand why.

It's indeed true that bad usage of templates and inlining result in code

> While we all hope compiler technology to improve, the reality check
> is that the impact factor of 20 years of compiler research is
> trivially measured: it's the difference between -O0 vs. -O3. If your
> program falls into the class where that makes a world of difference,
> good for you. Most programs in this world are not in that class, and
> you start engineering and making trade-offs.

Caches are also bigger and bigger, and that could reduce the issue.

Boost list run by bdawes at, gregod at, cpdaniel at, john at