Boost logo

Boost :

From: Sean Parent (sparent_at_[hidden])
Date: 2002-02-06 21:22:23


I spent several months a couple of years speeding up the build of Photoshop.
The culprit to build time was not the factoring of the header files by
function, but limiting the depth/breadth of the includes.

A typical file might include 5-10 other files, each including 5-10 more. The
tree doesn't have to get that deep before you find that the compiler is
chewing 30M lines of code when there are only 850K lines in the project.

The way I factored things was following to break each header into two parts.
One to be included by other headers (containing forward declarations and
typedefs). And one to be included by source files which contained full class
definitions, inlined functions, etc...

The goal was to make sure that the "header headers" didn't include anything
(or a minimal set of stable system includes that could be thrown into the
pre-combiled header).

I cut the time of a full build on my 300MHz G3 from close to an hour to 13
minutes. Of course, once I left the team and nobody else was actively
maintaining this structure it zoomed back up (it takes 40 minutes now on my
500MHz G3 and there are about 1.2M lines of code now).

I find it much more useful to prune the tree rather than make smaller
branches.

Sean

on 2/6/02 4:30 PM, mfdylan at dylan_at_[hidden] wrote:

> --- In boost_at_y..., "vesa_karvonen" <vesa_karvonen_at_h...> wrote:
>> --- In boost_at_y..., "mfdylan" <dylan_at_m...> wrote:
>>
>>> Because this could cause a huge rebuild most programmers are
> loathe
>>> to do it, and in my experience, tend to up resorting to hacks to
>>> avoid a total rebuild (and yes I've done it myself!).
>>
>> Personally, I think that when most programmers make the choice of
> not
>> making mini restructurings of large systems to avoid situations
> like
>> this, they are reaching for a local optimum that is very very far
>> from the global optimum. In other words, they are betting on the
>> wrong horse in the long run. In plain english, restructuring the
>> large system would make their work more pleasant as recompiles
> would
>> eventually become orders of magnitude faster.
>
> I agree entirely. Getting project managers and other developers to
> see it that way is the hard part - it's far more a political problem
> than a technical problem. The fact that the problem mainly occurs
> close to deadlines is adequate proof of this.
>>
>> Perhaps. Have you considered the amount of data the make system
> would
>> need to store? Have you considered the amount of time the make
> system
>> would need to spend examining dependencies? Not that they would
>> necessarily be problems, but have you actually considered them? I'd
>> like to see an actual implementation of a smart make system rather
>> than base important software design decision on vapour ware.
>>
> I've spent a bit of time toying with the idea, but nothing concrete.
> If you could integrate it into some sort of version-control software,
> determining what parts of a header file had changed wouldn't be so
> much of a problem. But there are systems that have smart recompiles,
> including VC supposedly (I haven't seen much evidence of it, but
> every now and then it seems to work out that a change, say adding a
> member function, doesn't affect any existing source files), and IBM's
> VisualAge - I'm curious how they actually work.
> One (fairly simple) possibility is that makedepend actually
> determines which lines of a header file a source file depends on.
> The makefile then uses a third tool that
>
> 1 checks for the existence of <header>.used
> 2 if it doesn't exist, copies <header> to <header>.used and compiles
> the source file.
> 3 if it does, then compares which lines that have changed and only
> rebuilds the source file if ones of the dependent lines has changed.
> A smarter version would ignore insertions of functions (but not
> member variables, as they can modify the size of a class).
>
> This only requires a single copy of every header file and reasonably
> simple tests - even detecting function vs variable additions could be
> done without having to parse the whole file - for instance just look
> for (...). It would break if someone did
>
> int variable; int function();
>
> But I would more than happily live with that!
>
>> Personally I think that long build times in C++ are primarily
> caused
>> by the design of C++ and secondarily caused by the design of large
>> systems. Since changing C++ is too difficult, programmers need to
>> design their systems better.
>>
> But even in a well designed system there can be widely used headers
> that are still "work in progress", meaningly you're mainly adding to
> them, or occasionally making minor modifications. We have several of
> these in our current project that mean frequent rebuilds that can
> take upwards of 30 minutes even on the fastest systems. Nearly all
> the modifications however affect only a few files and shouldn't cause
> extensive recompilation.
>
>> There are many ways to improve build times. One of the most
> effective
>> ways is to use a programming language that does not use text based
>> headers.
>>
> Of course. But I don't believe it's impossible to create a smart
> build system even given that restriction.
>
>>
>>> With modern
>>> precompiled header support, for a library that isn't likely to
>>> change much, I'd always use a single header file. For our own
>>> libraries that are under constant development I'll #include as
>>> little as possible. This seems to be so patently common sense I
>>> can't imagine there'd be so much disagreement over it.
>>
>> The common sense is wrong if you happen to be mostly and
> continuously
>> developing all the libraries you are using. Most good large system
>> are really collections of modules more or less like libraries.
>
> I meant by "#include as little as possible" to mean use as few of the
> finely grained headers as necessary. Unfortunately even this isn't
> enough to prevent a lot of unnecessary recompiles with current
> systems.
>
> Dylan
>
>
> Info: http://www.boost.org Send unsubscribe requests to:
> <mailto:boost-unsubscribe_at_[hidden]>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>

-- 
Sean Parent
Sr. Computer Scientist II
Advanced Technology Group
Adobe Systems Incorporated
sparent_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk