Boost Testing :
From: David Abrahams (dave_at_[hidden])
Date: 2007-10-27 13:42:55
I've installed Bitten (http://bitten.edgewall.org/) on my Trac server
and would like to use it for testing Boost (http://boost.org).
However, Boost has testing needs that I think may not be easily served
by Bitten in its current form -- at least I'm not seeing easy
solutions. Christopher Lenz, the author of Bitten, suggested that I
post about them on bitten_at_[hidden], thus this message, which
is also cross-posted to the boost-build and boost-testing lists.
Quick Overview of Bitten (for the Boost people)
Bitten is a continuous integration system that's integrated with Trac.
Testing machines (called "slaves") run a Python script (part of the
Bitten distribution) that checks periodically with the server to see
if anything needs to be tested. The server manages a set of testing
"recipes," each of which is associated with a directory in the
subversion repository and a set of slave criteria (such as "are you
running Linux?"). If a recipe has criteria matching the inquiring
slave and there have been changes in the associated part of the
repository, the slave executes the recipe.
A recipe is divided into a series of "steps," each of which is
composed of commands
report back to the server as they execute each step, so the server can
tell you about the progress of each slave. If any step fails, the
build fails and the slave stops. Recipes are expressed in XML and
stored in the server's Trac database. Trac users with the right
privileges can create, edit, and delete recipes.
Quick Overview of Boost Testing (for the Bitten people)
Boost is a collection of C++ libraries. Each library is maintained by
at most a handful of individuals. Some libraries depend on other
Boost uses its own build system, Boost.Build, which allows developers
to express build- and test- targets without special knowledge of the
platform (OS and specific compiler toolsets) that might be used to
create those targets. That's important, since most developers don't
have direct access to all the platforms they might want to support.
Each library author writes a suite of tests using Boost.Build. Anyone
who wants to test his library locally can run Boost.Build in his
library's test/ directory and see if any test targets fail. That's
important too -- we don't want people to have to install a bitten
slave in order to do local testing.
Some Boost.Build test targets only pass if others fail to build.
Boost uses that paradigm to make sure that constructs prohibited by a
library at compile-time actually fail to compile. This kind of
expected failure is encoded in the build specification just like the
creation of an executable or library.
Usually because of bugs or missing features in a platform, a library
may be unsupported or only partially supported on that platform, so
our test results feedback system post-processes the Boost.Build
results to decide how to represent failures that may be known. This
kind of expected failure is represented in
http://boost.org/status/explicit-failures-markup.xml, which allows us
to say whether any given library or test is expected to work on any
given platform, and if not, the possible reasons for the failure.
Our results feedback system (see
for example) shows reports for each library, describing its status two
ways: for developers
and for users
Of course there's a detailed results view for each library as well
Boost's web UI for results feedback leaves a lot to be desired, but
also has many unique features of real value to Boost. See
http://lists.boost.org/Archives/boost/2007/08/125724.php for a
Issues with Using Bitten for Boost Testing
* Where do recipes come From? Will library writers author them? If
so, won't that be redundant with information that's already captured
in Boost.Build declarations? If they're automatically generated
(say, by Boost.Build), how will they get into the Trac database?
* What is the granularity of recipes and steps? Probably the ideal is
that there's a recipe for each library. Testers with a particular
interest in some library could configure their slaves to only match
those recipes. That would also keep testing from stopping
altogether just because one library failed.
We'd probably also like to have one step for each of a library's
tests, so that we can get detailed feedback from Bitten about what
has failed and where.
As I understand it, however, a slave stops executing a recipe as
soon as one of its steps fails, but we probably don't want Bitten to
stop testing a whole library just because one of its tests fails (we
might want to stop if ten tests fail, though -- right now we just
keep on going).
Also, using a step for each test means invoking the Boost.Build
engine (bjam) separately for each test, which would be inefficient;
you'd pay the startup cost for BB at each step. So unless Bitten is
changed, we probably have to test a whole library in one step.
* What about parallelization? It seems to me that if we break up the
tests with one recipe and one step per library, we can fairly easily
parallelize testing of a single library but testing multiple
libraries in parallel could lead to race conditions if any two
libraries depend on a third one. Bjam would get launched once for
each dependent library, and both processes will try to build the
depenency at once.
One obvious approach to parallelization involves doing something we
should do anyway: testing against the libraries as built and
installed. If installation were a separate recipe then we could run
tests in parallel to our hearts' content. The problem is that AFAIK
there's no way to express a dependency graph among build recipes in
* A library should only be considered "broken" on a given platform if
a test that fails is not marked as expected, e.g. due to platform
bugs (or perhaps a test was passing in the previous release,
indicating a regression -- but in that case the test probably should
not be marked as an expected failure). That seems to imply the
ability to postprocess the testing results against
http://boost.org/status/explicit-failures-markup.xml before deciding
that a test has failed. Today we do that processing on the server,
but I think it would be reasonable to rewrite it (in Python) to
happen on the slave. That would probably make the system more
responsive overall. It would also be reasonable to break
http://boost.org/status/explicit-failures-markup.xml into separate
files for each library so one library's test postprocessing doesn't
have to see the markup for all the other libraries.
I'd be especially interested in hearing from Bitten folks if there are
ways of using Bitten that I've overlooked, or if they see particular
low-hanging features that would make testing Boost with Bitten more
Thanks for reading,
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com