Boost logo

Boost :

From: Martin Wille (mw8329_at_[hidden])
Date: 2005-03-07 17:20:05

David Abrahams wrote:
> "Victor A. Wagner Jr." <vawjr_at_[hidden]> writes:
>>At Sunday 2005-03-06 18:52, you wrote:
>>>Let's start revving up to release Boost 1.33.0. Personally, I'd like
>>>to get it out the door by mid-April at the latest, and I'm offering
>>>to manage this release.
>>thank you for your offer, but if you don't get the damned regression
> Please keep your language civil.
>>testing working FIRST (it's been non-responsive
> Can you please be more specific about what has been non-responsive? I
> doubt anyone can fix anything without more information.

Whatever tone might be appropriate or not ...

Several testers have raised issues and plead for better communication
several (probably many) times. Most of the time, we seem to get ignored,
unfortunately. I don't want to accuse anyone of voluntarily neglecting
our concerns. However, I think we apparently suffer from a "testing is
not too well understood" problem at several levels.

The tool chain employed for testing is very complex (due to the
diversity of compilers and operation systems involved) and too fragile.
Complexity leads to lack of understanding (among the testers and among
the library developers) and to false assumptions and to lack of
communication. It additionally causes long delays between changing code
and running the tests and between running the tests and the result being
rendered. This in turn makes isolating bugs in the libraries more
difficult. Fragility leads to the testing procedure breaking often and
to breaking without getting noticed for some time and to breaking
without anyone being able to recognize immediately exactly what part
broke. This is a very unpleasant situation for anyone involved and it
causes a significant level of frustration at least among those who run
the tests (e.g. to see the own test results not being rendered for
severals days or to see the test system being abused as a change
announcement system isn't exactly motivating).

Please, understand that a lot of resources (human and computers) are
wasted due to these problems. This waste is most apparent those who run
the tests. However, most of the time, issues raised by the testers
seemed to get ignored. Maybe, that was just because we didn't yell loud
enough or we didn't know whom to address or how to fix the problems.

Personally, I don't have any problem with the words Victor chose. Other
people might have. If you're one of them, then please understand that
we're feeling there's something going very wrong with the testing
procedure and we're afraid it will go on that way and we'll lose a lot
of the quality (and the reputation) Boost has.

The people involved in creating the test procedure have put very much
effort in it and the resulting system does its job nicely when it
happens to work correctly. However, apparently, the overall complexity
of the testing procedure has grown above our management capabilities.
This is one reason why release preparations take so long.

Maybe, we should take a step back and collect all the issues we have and
all knowledge about what is causing these issues.

I'll make a start, I hope others will contribute to the list.
Issues and causes unordered (please, excuse any duplicates):

- testing takes a huge amount of resources (HD, CPU, RAM, people
operating the test systems, people operating the result rendering
systems, people coding the test post processing tools, people finding
the bugs in the testing system)
- the testing procedure is complex
- the testing procedure is fragile
- the code-change to result-rendering process takes too long
- bugs in the testing procedure take too long to get fixed
- changes to code that will affect the testing procedure aren't
communicated well
- incremental testing doesn't work flawlessly
- deleting tests requires manual purging of old results in an
incremental testing environment.
- the number of target systems for testing is rather low; this results
in questionable portability.
- lousy performance of Sourceforge
- resource limitations at Sourceforge (e.g. the number of files there)
- between releases the testing system isn't as well maintained as during
the release preparations.
- test results aren't easily reproducible. They depend much on the
components on the respective testing systems (e.g. glibc version, system
compiler version, python version, kernel version and even on the
processor used on Linux)
- library maintainers don't have access to the testing systems; this
results in longer test-fix cycles.
- changes which will cause heavy load at the testing sites never get
announced in advance. This is a problem when testing resources have to
be shared with the normal workload (like in my case).
- changes that requires old test results to get purged usually don't get
- becoming a new contributor for testing resources is too difficult.
- we're supporting compilers that compile languages significantly
different from C++.
- there's no common concept of which compilers to support and which not.
- post-release displaying of test results apparently takes too much
effort. Otherwise, it would have been done.
- tests are run for compilers for which they are known to fail. 100%
waste of resources here.
- known-to-fail tests are rerun although the dependencies didn't change.
- some tests are insanely big.
- some library maintainers feel the need to run their own tests
regularly. Ideally, this shouldn't be necessary.
- test post processing has to work on output from different compilers.
Naturally, that output is formatted differently.
- test post processing makes use of very recent XSLT features.
- several times the post processing broke due to problems with the XSLT
- XSLT processing takes long (merging all the components that are input
to the result rendering takes ~1 hour just for the tests I run)
- the number of tests is growing
- there's no way of testing experimental changes to core libraries
without causing reruns of most tests (imagine someone would want to test
an experimental version of some part of MPL).
- switching between CVS branches during release preparations takes
additional resources and requires manual intervention.

I'm sure testers and library developers are able to add a lot more to
the list.


Boost list run by bdawes at, gregod at, cpdaniel at, john at