Boost logo

Boost-Build :

From: Rene Rivera (grafik.list_at_[hidden])
Date: 2005-10-04 17:29:19

Alexey Pakhunov wrote:
> Reece Dunn wrote:
>>Is there a way that we can speed this up? Maybe preallocate an initial
>>reserve space for strings? Or cache the memory as linked lists to save
>>allocate+recopy time?
>>When this is sorted, the load time should improve. Especially if you have a
>>lot of configurations in user-config.jam and a large project base.
> Some profiling will definitely help. ;-)

And on the new bjam the -d+10 option gives you that memory profiling info.

But AFAICT the only way to speed things up at this point is to reduce
the memory use. This is because of the way strings are managed in bjam,
and since *everything* is a string, or related to a string, there a
*lot* of strings floating around. A summary...

Bjam caches all strings, with some internal exceptions, into a hash
table. This is so it can reuse repeated strings to reduce memory use
(can't even imagine what the mem use would be without this).

* We generate many slightly different strings because of how the class
system is implemented.

* Bjam naturally generated many strings because of the variable
expansion semantics, i.e. how variables are exploded into permutations.

* Anything that processes string data, like header scanning in BBv2,
will produce many strings.

* Because the of the large number of strings the hash table caching them
consumes much memory for the structure management.

Here's a short dump, from running in Noel's test case, of the top memory

20100 10264 420 0.020896 4892626 243
5050 7992 210 0.041584 5097820 1009
20385 210 120 0.005887 6205020 304
20100 260 190 0.009453 7073221 351
15100 7992 240 0.015894 7270594 481
15100 7992 110 0.007285 8864430 587
5100 950 460 0.090196 13794634 2704
5050 1922 511 0.101188 17555491 3476
5837 24524 340 0.058249 17563141 3008
10100 910 540 0.053465 20413007 2021
5050 1662 880 0.174257 30214826 5983
5000 702 390 0.078000 32308920 6461
5000 2373 611 0.122200 46334519 9266
56358664 11466 11466 0.000203 64638420 1 [OTHER]
0 30352 0 0.000000 384811960 384811960 [TOTAL]

(Sorry if that wraps)

In the above [OTHER] is the memory the hash table itself is using. Mind
you this is *not* the working set, but the total allocations. Some
percentage of the above get's deallocated shortly after it get's
allocated, but it's a *small* percentage.

And of course [TOTAL] is the sum of all memory use. As you can see it's
a prodigeous 350Meg for a rather simple use case.

I don't have many ideas on how to improve the situation but here are two
that might be promising:

1. Create a native buffer functionality for which the memory can be
manually managed. This would allow some strings to move out of the cache
and be allowed to be freed soon after they are no longer needed. The
"print" functionality would make use of this. As so would any CAT like
builtin. It might be possible to make header scanning also use this.
This is not a real savings in memory use, just on lowering the impact on
other string use.

2. Reimplement the class system in some native form that doesn't require
importing new rules for the instance methods. This would be a _big_
win as it would remove by some factors the number of rule names floating
about in the string table. I haven't had time to think exactly how to
implement this change so not sure if it would even work, or I'm even
close in thinking hwo the class system is implemented :-)

And one last note, the above memory use is somewhat less than it was
last week. As I made some minor changes to the class implementation code.

-- Grafik - Don't Assume Anything
-- Redshift Software, Inc. -
-- rrivera/ - grafik/
-- 102708583/icq - grafikrobot/aim - Grafik/

Boost-Build list run by bdawes at, david.abrahams at, gregod at, cpdaniel at, john at