From: Martin Wille (mw8329_at_[hidden])
Date: 2005-03-08 10:52:07
Aleksey Gurtovoy wrote:
> Martin Wille writes:
>>The people involved in creating the test procedure have put very
>>much effort in it and the resulting system does its job nicely when
>>it happens to work correctly. However, apparently, the overall
>>complexity of the testing procedure has grown above our management
> Honestly, I don't see from what you conclude that, less how it's
> apparent. Having said that...
- many reports of "something doesn't work right", often related to
post-processing. Less than optimal responses on those. We all do
understand that you and Misha are under time constraints and therefor
aren't able to answer immediately. Having only two people who are able
to fix these things is one small part of our problems. The fact that
people do not know who would be responsible for finding out what part of
the testing procedure is going wrong seems to indicate a management problem.
- bugs suddenly go away and people involved in tracking them down do not
understand what was causing them. This kind of problem is probably
related to the build system. I consider this one fairly dangerous, actually.
- We're not really able to tell when a bug started to get reported.
>>Maybe, we should take a step back and collect all the issues we have
>>and all knowledge about what is causing these issues.
> ... this is a good idea. Making the issues visible definitely helps in
> keeping track of where we are and what still needs to be done, and
> quite possibly in soliciting resources to resolve them.
>>I'll make a start, I hope others will contribute to the list.
>>Issues and causes unordered (please, excuse any duplicates):
> I'll comment on the ones I have something to say about.
>>- testing takes a huge amount of resources (HD, CPU, RAM, people
>> operating the test systems, people operating the result rendering
>> systems, people coding the test post processing tools, people
>> finding the bugs in the testing system)
> True. It's also a very general observation. Don't see how having it
> here helps us.
I'm under the impression some people did not know how much resources
testing actually costs. I've seen reactions of surprise when I mentioned
the CPU time, HD space or RAM consumed by the tests. Pleas for splitting
test cases were ignored (e.g. random_test).
>>- the testing procedure is complex
> Internally, yes. The main complexity and _the_ source of fragility
> lies in "bjam results to XML" stage of processing. I'd say it's one of
> the top 10 issues by solving which we can substantially simplify
> everybody's life.
I agree. This processing step has to deal with the build system (which
in complex itself) and with different compiler output. Other complexity
probably stems from having to collect and to display test results that
reflect different cvs checkout times.
>>- the code-change to result-rendering process takes too long
> Not anymore. In any case, there is nothing in the used technology
> (XSLT) that would make this an inherent bottleneck. It became one
> because the original implementation of the reporting tools just
> wasn't written for the volume of the processed data the tools are
> asked to handle nowdays.
*This* step might be a lot fast now (congrats, this is a *big*
improvement). However, there still are other factors which make the
code-change to result rendering process take too long.
>>- bugs in the testing procedure take too long to get fixed
> I think all I can say on this one is said here --
I'm not trying to imply Misha or you wouldn't do enough. However, the
fact that only two people have the knowledge and the access to the
result collection stage of the testing process is a problem in itself.
>>- incremental testing doesn't work flawlessly
> That's IMO another "top 10" issue that hurts a lot.
>>- deleting tests requires manual purging of old results in an
>> incremental testing environment.
> Just an example of the above, IMO.
Right. However, it's one of the more difficult problems to solve. The
build system would have to be expanded to make it delete results for
tests which don't exist anymore.
>>- lousy performance of Sourceforge
>>- resource limitations at Sourceforge (e.g. the number of files there)
> This doesn't hurt us anymore, does it?
It hurts everytime the result collecting stage doesn't work correctly.
We're not able to generate our own XML results and to upload them due to
the SF resource limits.
>>- test results aren't easily reproducible. They depend much on the
>> components on the respective testing systems (e.g. glibc version,
>> system compiler version, python version, kernel version and even on
>> the processor used on Linux)
> True. There is much we can do about it, though, is it?
You're probably right. However, I wanted to mention this point, because
someone might have an idea how to address it. I guess it boils down to
needing more testers in order to see more flavours of similar environments.
>>- becoming a new contributor for testing resources is too difficult.
> I don't think it's true anymore. How simplier it can become --
Hmm, recent traffic on the testing reflector seemed to indicate it isn't
too simple. This might be caused by problems with the build system.
>>- we're supporting compilers that compile languages significantly
>> different from C++.
> Meaning significantly non-conforming compilers or something else?
Yes, significantly non-conforming compilers.
>>- post-release displaying of test results apparently takes too much
>> effort. Otherwise, it would have been done.
> Huh? The were on the website (and still are) the day the release was
> announced. See
Well, I take that back then. However, this URL seems not to be well
known. Not a problem then.
>>- some library maintainers feel the need to run their own tests
>> regularly. Ideally, this shouldn't be necessary.
> Agreed ("regularly" is a key word here). IMO the best we can do here
> is to ask them to list the reasons for doing so.
One reason sure is that the test environments or the test cycles
available are somehow unsatisfying. I would understand either. More
testers would help here, too.
>>- test post processing has to work on output from different
>> compilers. Naturally, that output is formatted differently.
> What's the problem here?
It isn't a problem? We don't parse the output from the compilers?
>>- several times the post processing broke due to problems with the
>> XSLT processor.
> And twice as often it broke due to somebody's erroneous checkin. The
> latter is IMO much more important to account for and handle to
> gracefully. Most of XSLT-related problems of the past were caused by
> inadequate usage, such as transformation algorithms not prepared for a
> huge volume of data we are now processing.
Do you expect the recent updates to be able to handle a significantly
higher volume? This would be big improvement. I'm asking because I had
the impression some parts in the XSL processing used O(n^2) algorithms
(or worse). My local tests with changing the length of pathnames seemed
to indicate that (replacing "/home/boost" with "/boost" resulted in
significant speedup of the XSLT processor).
>>- there's no way of testing experimental changes to core libraries
>> without causing reruns of most tests (imagine someone would want to
>> test an experimental version of some part of MPL).
> Do you mean running library tests only off the branch?
Yes, and running only a reduced set of tests for that if possible.
I think this would help the library maintainers.
>>- switching between CVS branches during release preparations takes
>> additional resources and requires manual intervention.
> What do you think of this one --
I'm with Victor on this point; for the testers (and hopefully there'll
be more of them one day) it's significantly easier not to have to change
anything during the release preparations. This could be achieved by
using the CVS trunk as release branch until the actual release gets
tagged. Development would have to continue in a branch and to be merged
back into the branch after the release.
Ideally, the testers would be able to run the test without having to
attend the runs. This is currently not possible.
(Just as an example: while I'm writing this I recognize that,
apparently, I'm unable to upload test results now because of an error
caused by one of the Python scripts: "ImportError: No module named ftp")
> Finally, thanks for putting this together!
I hoped other people would contribute to the list; I'm sure there's a
lot more to say about testing. E.g. it would be nice to have some sort
of history of recent regression results. It would be nice to be able to
split the runs vertically (running tests for a smaller set of toolsets)
and horizontally (running tests for a smaller set of libraries) easily;
I realize, though, that presenting the results would become more difficult.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk