From: Aleksey Gurtovoy (agurtovoy_at_[hidden])
Date: 2005-03-08 03:30:03
Martin Wille writes:
> Several testers have raised issues and plead for better communication
> several (probably many) times. Most of the time, we seem to get
> ignored, unfortunately. I don't want to accuse anyone of voluntarily
> neglecting our concerns. However, I think we apparently suffer from a
> "testing is not too well understood" problem at several levels.
> The tool chain employed for testing is very complex (due to the
> diversity of compilers and operation systems involved) and too
> Complexity leads to lack of understanding (among the testers and among
> the library developers) and to false assumptions and to lack of
> communication. It additionally causes long delays between changing
> code and running the tests and between running the tests and the
> result being rendered. This in turn makes isolating bugs in the
> libraries more difficult. Fragility leads to the testing procedure
> breaking often and to breaking without getting noticed for some time
> and to breaking without anyone being able to recognize immediately
> exactly what part broke. This is a very unpleasant situation for
> anyone involved and it causes a significant level of frustration at
> least among those who run the tests (e.g. to see the own test results
> not being rendered for severals days or to see the test system being
> abused as a change announcement system isn't exactly motivating).
> Please, understand that a lot of resources (human and computers) are
> wasted due to these problems. This waste is most apparent those who
> run the tests. However, most of the time, issues raised by the testers
> seemed to get ignored. Maybe, that was just because we didn't yell
> loud enough or we didn't know whom to address or how to fix the
I agree with everything what is said above...
> Personally, I don't have any problem with the words Victor
> chose. Other people might have. If you're one of them, then please
> understand that we're feeling there's something going very wrong with
> the testing procedure
.. but not this one. I, for one, don't feel this way. There is work to
be done and issues to be resolved, true, but people are working on it
and things do improve substantially over time. Comparing to pre-1.32
testing procedures and practices we are now on a totally different
level of usability, coverage, simplicity of installation, and overall
usefulness of the regression tools.
> and we're afraid it will go on that way and we'll lose a lot of the
> quality (and the reputation) Boost has.
I think you are overstating things. If anything, things got
significantly better on this front. The quality was poorly inforced
until very recently -- we simply had no tools to get a more or less
accurate, comprehensive picture of it. We do now, and people are
working on moving things further forward. Not fast enough? Somebody
who feels that way should give them a hand, then.
> The people involved in creating the test procedure have put very
> much effort in it and the resulting system does its job nicely when
> it happens to work correctly. However, apparently, the overall
> complexity of the testing procedure has grown above our management
Honestly, I don't see from what you conclude that, less how it's
apparent. Having said that...
> Maybe, we should take a step back and collect all the issues we have
> and all knowledge about what is causing these issues.
... this is a good idea. Making the issues visible definitely helps in
keeping track of where we are and what still needs to be done, and
quite possibly in soliciting resources to resolve them.
> I'll make a start, I hope others will contribute to the list.
> Issues and causes unordered (please, excuse any duplicates):
I'll comment on the ones I have something to say about.
> - testing takes a huge amount of resources (HD, CPU, RAM, people
> operating the test systems, people operating the result rendering
> systems, people coding the test post processing tools, people
> finding the bugs in the testing system)
True. It's also a very general observation. Don't see how having it
here helps us.
> - the testing procedure is complex
Internally, yes. The main complexity and _the_ source of fragility
lies in "bjam results to XML" stage of processing. I'd say it's one of
the top 10 issues by solving which we can substantially simplify
> - the testing procedure is fragile
See the above.
> - the code-change to result-rendering process takes too long
Not anymore. In any case, there is nothing in the used technology
(XSLT) that would make this an inherent bottleneck. It became one
because the original implementation of the reporting tools just
wasn't written for the volume of the processed data the tools are
asked to handle nowdays.
> - bugs in the testing procedure take too long to get fixed
I think all I can say on this one is said here --
> - incremental testing doesn't work flawlessly
That's IMO another "top 10" issue that hurts a lot.
> - deleting tests requires manual purging of old results in an
> incremental testing environment.
Just an example of the above, IMO.
> - the number of target systems for testing is rather low; this
> results in questionable portability.
Yes, we need more volunteers. Another "top 10" item.
> - lousy performance of Sourceforge
> - resource limitations at Sourceforge (e.g. the number of files there)
This doesn't hurt us anymore, does it?
> - test results aren't easily reproducible. They depend much on the
> components on the respective testing systems (e.g. glibc version,
> system compiler version, python version, kernel version and even on
> the processor used on Linux)
True. There is much we can do about it, though, is it?
> - becoming a new contributor for testing resources is too difficult.
I don't think it's true anymore. How simplier it can become --
> - we're supporting compilers that compile languages significantly
> different from C++.
Meaning significantly non-conforming compilers or something else?
> - there's no common concept of which compilers to support and which
I think the criteria have been formulated several times.
> - post-release displaying of test results apparently takes too much
> effort. Otherwise, it would have been done.
Huh? The were on the website (and still are) the day the release was
> - tests are run for compilers for which they are known to fail. 100%
> waste of resources here.
Agreed 100%. Also "top 10" item.
> - known-to-fail tests are rerun although the dependencies didn't change.
> - some library maintainers feel the need to run their own tests
> regularly. Ideally, this shouldn't be necessary.
Agreed ("regularly" is a key word here). IMO the best we can do here
is to ask them to list the reasons for doing so.
> - test post processing has to work on output from different
> compilers. Naturally, that output is formatted differently.
What's the problem here?
> - several times the post processing broke due to problems with the
> XSLT processor.
And twice as often it broke due to somebody's erroneous checkin. The
latter is IMO much more important to account for and handle to
gracefully. Most of XSLT-related problems of the past were caused by
inadequate usage, such as transformation algorithms not prepared for a
huge volume of data we are now processing.
> - XSLT processing takes long (merging all the components that are
> input to the result rendering takes ~1 hour just for the tests I
> - the number of tests is growing
And more distributing testing is the only answer to this.
> - there's no way of testing experimental changes to core libraries
> without causing reruns of most tests (imagine someone would want to
> test an experimental version of some part of MPL).
Do you mean running library tests only off the branch?
> - switching between CVS branches during release preparations takes
> additional resources and requires manual intervention.
What do you think of this one --
Finally, thanks for putting this together!
-- Aleksey Gurtovoy MetaCommunications Engineering
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk