Boost logo

Boost-Build :

Subject: [Boost-build] Python Port Updates
From: Aaron Boman (aaron_at_[hidden])
Date: 2016-10-10 00:51:57


All,

I have pushed all of my changes to the Python port to a branch named
features/python-updates. This includes all of the changes I needed to
bring the Python port up to date with Jam as well as all of the
optimizations I've found within the last week.

Testing
=======
Running all of the tests against the Python port on the latest develop
branch with the MSVC toolset had a result of 47 PASSes and 82 FAILs.
With all of my changes, the summary changes to 113 PASSes and 16 FAILs.
I made fixes to the code to get as many of the tests to pass as I
could only altering certain tests by adding the additional Python
module needed for importing. The remaining test failures are due to
the way the tests were written to specifically support Jam. For
example, a lot of the failures are due to importing modules like
"class", "modules", or "assert" (modules that don't need to exist in
the Python port). I wasn't sure what the best approach would be
to support tests for both Python and Jam. For our own tests, we test
both Python and Jam using the same tests and in the rare case where
an implementation is so radically different (or not supported, as we
are deprecating Jam support), we have a special switch variable set
to determine if the test being run is for Python or Jam. At which point
we can run different code.

I think getting the testing back up to 100% PASSing is probably the
best step we can take at this point to ensure that nothing regresses
for any additional functionality moving forward. This would include
writing unit tests, which the Python port currently lacks. I don't
think it should be too much additional work to get the remaining
16 tests to PASS. I just need a good plan of attack so as to not
interfere with the stable Jam side.

Optimizations
=============
I spent a good week running several projects of ours through the Python
profiler and analyzing the statistics. As I eluded to in a previous
thread, the Features, Properties, and PropertySets were the biggest
offenders. On a fairly large project, removing the getters from the
Feature and Property classes in favor of direct attribute access shaved
about 5 seconds per attribute off the build. Overall it saved about 30
to 40 seconds for the setup phases. The biggest optimization to be had
was in the creation of the PropertySet instances. The
property_set.create() function (as well as any of its called functions)
was taking well over a minute. I reduced this time down to a few
seconds by memoizing the Property class (passing in the same set of
arguments will return the same Property instance). Since I am
guaranteed to get the same Property instance, I can then assign a
unique ID to each instance and use the collection of IDs to create a
key for the PropertySet cache. Using the integer IDs is much faster to
sort and hash than it is to sort and hash the Property
instance itself (which was sorting and hashing by strings). There were
several other optimizations. Most dealt with efficient string handling
or removing function calls when they weren't necessary. My test was
running a large project's incremental build and see how long it took
to determine that nothing needed to be updated. The Jam side was taking
about 1 minute 10 seconds. Before all of the optimizations, Python was
taking about 4 minutes 30 seconds. After the optimizations, it's pretty
close to Jam at about 1 minute 15 seconds.

Side thought: while I would rather see the Python port take off, it
seems like if the Jam side could update the parser to directly convert
<feature>value to "property" instances when parsing the Jam files.
Then, the same optimization could be applied when creating property-set
instances. This would significantly increase the speed of Jam. Of
course, this would probably require a decent amount of work to the
entire codebase to support "property" instances rather than just
tokens.

Since I've made so many changes, I decided to keep my changes on a
branch for now in case anyone has any objections. But, I would like to
get the changes merged into the develop branch soon so that our local
fork can point to Boost.Build proper.

Looking forward
===============
The optimizations were my main focus for the last week. Looking
forward, I plan on attempting to make a "configure" feature. The Waf
build system has this feature and it would help alleviate some of the
pain our developers have when performing incremental builds. I'm
imagining a command line similar to:
     `b2 my-target my=configuration details=here --configure MY_CONFIG`
This would then perform the metatarget and virtualtarget generation
steps, collect all actualized targets, all target variables, all
scanners, all toolsets, basically all initialization information and
dump it into a file named MY_CONFIG.db in the bin/ directory. Then,
subsequent invocations of b2 would use something like the following
command:
     b2 --use-configuration MY_CONFIG
This would skip the metatarget/virtualtarget phase and use the
pre-calculated configuration. A bonus would be to somehow quickly
determine if any of the Jam files have been touched and warn the user
that the configuration that they're running is out of date as I have
a feeling that with this feature, a great many people would get bit
by the problem of updating a Jam file and wondering why their build
is not reflecting the change.

Most of our developers tend to work on one file at a time. They will
make changes to a single file then rebuild. Sometimes, it can take
several seconds before a build starts. Using the preconfigured approach
should significantly reduce that start up time.

Also: I've toyed with the idea of dumping all target information into
a IDE-digestable project file; probably some abstract XML file that can
then be converted/transformed into some IDE-specific XML file. In order
to support this, all of the actions will need to be rendered and
dumped. Although, I haven't figured out how re-scanning the created
dependencies would work. The configure feature would be a step towards
native IDE project support.

Any thoughts, concerns, questions?

Thanks,
Aaron


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk