Boost logo

Boost :

From: Stefan Seefeld (seefeld_at_[hidden])
Date: 2007-05-03 15:04:48


There are a great many things that could (and should) be discussed with
respect to the boost infrastructure, as well as the development process.

This is about testing, though, so I'd like to restrict my arguments to
that as much as possible.

I hear (and share) various complaints about the existing testing procedure:

* Test runs require a lot of resources.

* Test runs take a lot of time.

* There is no clear (visual) association between test results and code
  revisions.

* There are (sometimes) multiple test runs for the same platform, so the
  absolute number of failures has no meaning (worse, two test runs for
  the same platform may result in differing outcomes, because some environment
  variables differ and are not accounted for).

Now let me contrast that to some utopic boost testing harness with the
following characteristics:

* The boost repository stores code, as well as a description of platforms,
  configurations that the code should be tested on.

* The overal space of tests is chunked by some local harness into
  small-scale test suites, accessible for volunteers to run.

* Contributors subscribe by providing some well controlled environment
  in which such test suites can run. The whole works somewhat similar to
  seti_at_home (say), i.e. users merely install some 'slave' that then
  contacts the master to retrieve individual tasks, sending back results
  as they are ready.

* The master harness then collects results, generates reports, and otherwise
  postprocesses the incoming data. For example, individual slaves may be
  associated with some confidence ('trust') about the validity of the results
  (after all, there is always that last bit of uncontrolled environment
  potentially affecting test runs...)

What does it take to get there ?

I think there are different paths to pursue, more or less independently.

1) The test run procedure should be made more and more autonomous, requiring
   less hand-holding by the user. The less parameters there are for users to
   set, the less error-prone (or at least, subject of interpretation) the
   results become. This also implies a much enhanced facility to report
   platform characteristics from the user's platform as part of the test run
   results. (In fact, this should be reported upfront, as these data determine
   what part of the mosaic the slave will actually execute.)

2) The smaller tasks, as well as the more convenient handling, should increase
   parallelism, leading to a shorter turn-around.
   That, together with better annotation should allow the report generator to
   more correctly associate test results with code versions, helping developers
   to better understand what changeset a regression relates to.

I think that a good tool to use for 1) is buildbot (http://buildbot.net/trac).
It allows to formalize the build process. The only remaining unknown is the
environment seen by the buildslaves when they are started. However, a) all
environment variables are reported, and b) we can encapsulate the slave
startup further to control the environmental variables to be seen by the
build process.

As far as the size of tasks (test suites) is concerned, this question is
related to the discussion concerning modularity.
Individual test runs should at most run a single toolchain on a single
library, but may be even less (a single build variant, say).

Keeping modularity at that level also allows to parametrize test sub-suites.
For example, the boost.python testsuite may need to be tested against
different python versions, while boost.mpi needs to be tested against
different MPI backends / versions. Etc.

Regards,
                Stefan

-- 
      ...ich hab' noch einen Koffer in Berlin...

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk