Boost logo

Boost Testing :

From: Aleksey Gurtovoy (agurtovoy_at_[hidden])
Date: 2005-03-09 07:51:18


Rene Rivera writes:
> For along time now one of my objectives has been to use BuildBot
> (http://buildbot.sf.net/) to improve the management of running the
> regression tests. For those who don't feel like reading about that
> software here's a quick summary:
>
> It is a client-server based configuration where the test server controls
> the test clients telling them what to do for test. In this arrangement
> the client do not directly control what they are testing, nor how. All
> the clients provide is an execution environment for the testing. The
> server is configured to control the various clients as to what they
> test, including how to get the source, and when they test. The testing
> is change driven instead of time driven as we currently have. This means
> that each time a change occurs on the source (CVS in our case) the
> server decides to tell the clients to run tests. Because of this direct
> communication to the clients the server is able to give direct feedback
> as to what the testing is doing.

The feedback part sounds useful...

> This includes a live display of test progress, down to a dynamic
> view of the test log.

.. although this part would be pretty much useless in our case and put
quite a burden on the client's network bandwidth.

>
> Some of the issues and how this step might help. And other changes
> I've thought about might help..
>
> *Reproducibility*
>
> BuildBot controls what the checkouts are and what triggered the
> build. You can see this by selecting one of the yellow "Build #" links
> on the BuildBot display. If there's ever a question about what code
> produced an error one can get the specific version of the tree to
> attempt reproduction. The history of what tests have run is
> kept. Which means that we can finally answer the dreaded "When did
> that start breaking?" question.

Change-driven testing is not the only way to get an answer to this
question. But the huge point that is missing here is that how you
define, determine, and display "breakage". No offense, but the
BuiltBot logs at the few sites listed on the above page are in the
prehistoric age comparing to what we have.

> *Scalability*
>
> One big change I'd like to see is the breakup of running tests to make
> it possible for testers to run tests on subsets of libraries, or even
> individual libraries. For example there might be some testers which
> have a special interest in a particular library, Boost.Python comes to
> mind. It would be ideal to make it possible for them to only run those
> tests, and to go through the extra steps of doing the Python
> setup. Also for some popular platforms, it becomes possible to get
> much faster response rates from testing if, for example, we partition
> the library testing space throughout those platforms and they would
> test the libraries in parallel.

Totally agree.

>
> For this to happen some significant organizational changes need to
> happen to the tests. As it currently stands such a division is not
> possible because of the way tests are defined and organized. We have a
> single test point, status/Jamfile, which a) points to the rest of the
> test points, libs/*/test/Jamfile, and b) defines it's own tests. The
> problem is that there is a conflict, for some libraries, between the
> tests defined in status/Jamfile, and the tests defined in
> libs/*/test/Jamfile. For example Boost.Config has a reduced set of
> test in status/Jamfile.

This is just a legacy. Many of these were fixed (tests moved to
standalone jamfiles into the corresponding libraries' directories)
during 1.32 preparations.

> This situation is likely a reaction to reduce test times to
> something manageable.

I'm not 100% positive about Boost.Config (John?), but for other
libraries it's just an atavism.

> And I understand that library authors would need to have a place to
> run their own set of comprehensive tests.

I'm not sure comprehensive vs. basic tests scheme is worth our time at
the moment. We have much bigger issues to worry about.

>
> My proposal is to create a set of canonical tests that comprise the
> regression testing suite independent of the library author's
> tests. This set of tests would be structured so that it's possible to
> run each libraries tests independent of others.
>
> It would be adding this type of structure:
>
> boost-root/tests/<library>/Jamfile
> boost-root/tests/<library>/<sub-library>/Jamfile
> boost-root/tests/<library>/<some-other-grouping>/Jamfile
>
> So for example a tester could say she only wants to test python/*, or
> numeric/ublas/*, etc.

I agree on the structuring and user specification parts. Not sure
about the Jamfiles location -- it's a bit of pain to go to an
unrelated directory to modify Jamfile when you add a new test case to
your library's test suite.

>
> *Fragility*
>
> Simply put because BuildBot exposes the complete testing procedure it
> becomes much easier to see where problems occur in the testing.

Without specifics this statement is worth very little. I for one don't
see how it's going to solve any of the most hurting problems
enumerated in Martin's post and the follow ups.

> Also, because there is a central server managing the procedures it's
> more likely that problems can be solved at that one location instead
> of opaquely trying to fix the clients.

Again, that's just doesn't ring any bell to me. For one, a central
server was in fact mentioned as one of the main sources of fragility.
The pressing issues we have with the current state of affairs are very
specific, and the above doesn't seem relevant.

> With the, hopefully, additional distributed testers it becomes
> harder for testing to break down completely.
>
> Another aspect we need to address is fragility of the server.

Yes.

> Ideally
> we would have multiple BuildBot servers to add redundancy. To make
> such a multi-server, multi-client setup resource efficient we would
> need to manage the distribution of testing between them.
>
> *Resources*
>
> The only predictable way to address the resource usage, is to
> distribute the testing so we can create more capacity.

Agree 100%.

> Breaking up the
> tests is the only way I can see to move there. It was already
> suggested that only slicing the testing to single toolsets would
> help. But that doesn't really address the problems. For example it
> would still be prohibitive for me to run tests on my G3 Mac for CW-8.3
> because it's a slow machine and it would take days to run just one
> cycle of tests making the results useless. But it would be possible
> for me to run a minimal set of tests, for example Boost.Config and
> other basic tests.
>
> Restructuring also brings up the possibility of moderating the set of
> tests that make it into the regression suite. Right now we are in a
> position of asking library authors to define what the tests for the
> library. We can't moderate what gets into testing, or what needs go
> out. We need some procedure for reviewing tests, and some form of
> approval to get tests into the regression system. I'm not saying that
> authors would not be able to do additional testing, just that we need
> to define what the standard set of tests are so that we can
> concentrate our testing efforts. It would still be possible to set up
> additional resources to run "experimental" tests.
>
> *Response*
>
> The gain from segmentation and distribution of testing is hopefully
> obvious ;-) But another advantage of using BuildBot is that we are not
> tied to waiting for the XSLT processing to see results.

First of all, "waiting" for the XSLT is not a problem anymore. Secondly,
you are throwing baby out with the bathwater. What the BuiltBot page is
going to display? The number of failingtests? The bjam logs? I'd like
to see somebody to get a release out basing on something like this.

> Sure the
> results are not going to be as incredibly well organized as the
> Meta-Comm results but they are immediately available.

And, in the context of Boost, pretty much useless.

> *Releases*
>
> Managing the testing for a release was brought up a many times. And
> it's clear that requiring testers to do manual changes is just not
> working. For BuildBot having the control of what is tested on the
> server means that at any point one, or some small number, person(s)
> can make the switch to have testing resources devoted to release
> testing. One possibility that the finner grain testing allows for is
> to limit the testing for a release to the required set of toolsets and
> platforms.

I agree that centralized control over what is getting tested by
individual clients is desirable.

Finally, thanks for posting more material to keep the discussion going.

-- 
Aleksey Gurtovoy
MetaCommunications Engineering

Boost-testing list run by mbergal at meta-comm.com