|
Boost : |
From: Stefan Seefeld (seefeld_at_[hidden])
Date: 2007-05-03 15:04:48
There are a great many things that could (and should) be discussed with
respect to the boost infrastructure, as well as the development process.
This is about testing, though, so I'd like to restrict my arguments to
that as much as possible.
I hear (and share) various complaints about the existing testing procedure:
* Test runs require a lot of resources.
* Test runs take a lot of time.
* There is no clear (visual) association between test results and code
revisions.
* There are (sometimes) multiple test runs for the same platform, so the
absolute number of failures has no meaning (worse, two test runs for
the same platform may result in differing outcomes, because some environment
variables differ and are not accounted for).
Now let me contrast that to some utopic boost testing harness with the
following characteristics:
* The boost repository stores code, as well as a description of platforms,
configurations that the code should be tested on.
* The overal space of tests is chunked by some local harness into
small-scale test suites, accessible for volunteers to run.
* Contributors subscribe by providing some well controlled environment
in which such test suites can run. The whole works somewhat similar to
seti_at_home (say), i.e. users merely install some 'slave' that then
contacts the master to retrieve individual tasks, sending back results
as they are ready.
* The master harness then collects results, generates reports, and otherwise
postprocesses the incoming data. For example, individual slaves may be
associated with some confidence ('trust') about the validity of the results
(after all, there is always that last bit of uncontrolled environment
potentially affecting test runs...)
What does it take to get there ?
I think there are different paths to pursue, more or less independently.
1) The test run procedure should be made more and more autonomous, requiring
less hand-holding by the user. The less parameters there are for users to
set, the less error-prone (or at least, subject of interpretation) the
results become. This also implies a much enhanced facility to report
platform characteristics from the user's platform as part of the test run
results. (In fact, this should be reported upfront, as these data determine
what part of the mosaic the slave will actually execute.)
2) The smaller tasks, as well as the more convenient handling, should increase
parallelism, leading to a shorter turn-around.
That, together with better annotation should allow the report generator to
more correctly associate test results with code versions, helping developers
to better understand what changeset a regression relates to.
I think that a good tool to use for 1) is buildbot (http://buildbot.net/trac).
It allows to formalize the build process. The only remaining unknown is the
environment seen by the buildslaves when they are started. However, a) all
environment variables are reported, and b) we can encapsulate the slave
startup further to control the environmental variables to be seen by the
build process.
As far as the size of tasks (test suites) is concerned, this question is
related to the discussion concerning modularity.
Individual test runs should at most run a single toolchain on a single
library, but may be even less (a single build variant, say).
Keeping modularity at that level also allows to parametrize test sub-suites.
For example, the boost.python testsuite may need to be tested against
different python versions, while boost.mpi needs to be tested against
different MPI backends / versions. Etc.
Regards,
Stefan
-- ...ich hab' noch einen Koffer in Berlin...
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk