Boost logo

Boost Testing :

From: Rene Rivera (grafik.list_at_[hidden])
Date: 2005-03-08 21:16:53


All,

I've been reading all the testing related post and not having time to
respond to them. So I decided to talk about what might be done to
improve things in one big posting.

For along time now one of my objectives has been to use BuildBot
(http://buildbot.sf.net/) to improve the management of running the
regression tests. For those who don't feel like reading about that
software here's a quick summary:

It is a client-server based configuration where the test server controls
the test clients telling them what to do for test. In this arrangement
the client do not directly control what they are testing, nor how. All
the clients provide is an execution environment for the testing. The
server is configured to control the various clients as to what they
test, including how to get the source, and when they test. The testing
is change driven instead of time driven as we currently have. This means
that each time a change occurs on the source (CVS in our case) the
server decides to tell the clients to run tests. Because of this direct
communication to the clients the server is able to give direct feedback
as to what the testing is doing. This includes a live display of test
progress, down to a dynamic view of the test log.

Some of the issues and how this step might help. And other changes I've
thought about might help..

*Reproducibility*

BuildBot controls what the checkouts are and what triggered the build.
You can see this by selecting one of the yellow "Build #" links on the
BuildBot display. If there's ever a question about what code produced an
error one can get the specific version of the tree to attempt
reproduction. The history of what tests have run is kept. Which means
that we can finally answer the dreaded "When did that start breaking?"
question.

*Scalability*

One big change I'd like to see is the breakup of running tests to make
it possible for testers to run tests on subsets of libraries, or even
individual libraries. For example there might be some testers which have
a special interest in a particular library, Boost.Python comes to mind.
It would be ideal to make it possible for them to only run those tests,
and to go through the extra steps of doing the Python setup. Also for
some popular platforms, it becomes possible to get much faster response
rates from testing if, for example, we partition the library testing
space throughout those platforms and they would test the libraries in
parallel.

For this to happen some significant organizational changes need to
happen to the tests. As it currently stands such a division is not
possible because of the way tests are defined and organized. We have a
single test point, status/Jamfile, which a) points to the rest of the
test points, libs/*/test/Jamfile, and b) defines it's own tests. The
problem is that there is a conflict, for some libraries, between the
tests defined in status/Jamfile, and the tests defined in
libs/*/test/Jamfile. For example Boost.Config has a reduced set of test
in status/Jamfile. This situation is likely a reaction to reduce test
times to something manageable. And I understand that library authors
would need to have a place to run their own set of comprehensive tests.

My proposal is to create a set of canonical tests that comprise the
regression testing suite independent of the library author's tests. This
set of tests would be structured so that it's possible to run each
libraries tests independent of others. It would be adding this type of
structure:

boost-root/tests/<library>/Jamfile
boost-root/tests/<library>/<sub-library>/Jamfile
boost-root/tests/<library>/<some-other-grouping>/Jamfile

So for example a tester could say she only wants to test python/*, or
numeric/ublas/*, etc.

*Fragility*

Simply put because BuildBot exposes the complete testing procedure it
becomes much easier to see where problems occur in the testing. Also,
because there is a central server managing the procedures it's more
likely that problems can be solved at that one location instead of
opaquely trying to fix the clients. With the, hopefully, additional
distributed testers it becomes harder for testing to break down completely.

Another aspect we need to address is fragility of the server. Ideally we
would have multiple BuildBot servers to add redundancy. To make such a
multi-server, multi-client setup resource efficient we would need to
manage the distribution of testing between them.

*Resources*

The only predictable way to address the resource usage, is to distribute
the testing so we can create more capacity. Breaking up the tests is the
only way I can see to move there. It was already suggested that only
slicing the testing to single toolsets would help. But that doesn't
really address the problems. For example it would still be prohibitive
for me to run tests on my G3 Mac for CW-8.3 because it's a slow machine
and it would take days to run just one cycle of tests making the results
useless. But it would be possible for me to run a minimal set of tests,
for example Boost.Config and other basic tests.

Restructuring also brings up the possibility of moderating the set of
tests that make it into the regression suite. Right now we are in a
position of asking library authors to define what the tests for the
library. We can't moderate what gets into testing, or what needs go out.
We need some procedure for reviewing tests, and some form of approval to
get tests into the regression system. I'm not saying that authors would
not be able to do additional testing, just that we need to define what
the standard set of tests are so that we can concentrate our testing
efforts. It would still be possible to set up additional resources to
run "experimental" tests.

*Response*

The gain from segmentation and distribution of testing is hopefully
obvious ;-) But another advantage of using BuildBot is that we are not
tied to waiting for the XSLT processing to see results. Sure the results
are not going to be as incredibly well organized as the Meta-Comm
results but they are immediately available. So if there is a significant
delay in the processing, because of load or breakage, we can still
continue working.

*Releases*

Managing the testing for a release was brought up a many times. And it's
clear that requiring testers to do manual changes is just not working.
For BuildBot having the control of what is tested on the server means
that at any point one, or some small number, person(s) can make the
switch to have testing resources devoted to release testing. One
possibility that the finner grain testing allows for is to limit the
testing for a release to the required set of toolsets and platforms.

OK, that's enough rambling... I know I haven't mentioned many other
items raised, particularly about bjam and Boost.Build. I'll leave that
for another post. I just need to get back to setting up the BuildBot
server now ;-)

-- 
-- Grafik - Don't Assume Anything
-- Redshift Software, Inc. - http://redshift-software.com
-- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

Boost-testing list run by mbergal at meta-comm.com