Boost logo

Boost Testing :

From: Rene Rivera (grafik.list_at_[hidden])
Date: 2005-03-09 10:51:46


Aleksey Gurtovoy wrote:
> Rene Rivera writes:
>
>> For along time now one of my objectives has been to use BuildBot
>> (http://buildbot.sf.net/) to improve the management of running the
>> regression tests. For those who don't feel like reading about that
>> software here's a quick summary:
>>
>> It is a client-server based configuration where the test server
>> controls the test clients telling them what to do for test. In this
>> arrangement the client do not directly control what they are
>> testing, nor how. All the clients provide is an execution
>> environment for the testing. The server is configured to control
>> the various clients as to what they test, including how to get the
>> source, and when they test. The testing is change driven instead of
>> time driven as we currently have. This means that each time a
>> change occurs on the source (CVS in our case) the server decides to
>> tell the clients to run tests. Because of this direct
>> communication to the clients the server is able to give direct
>> feedback as to what the testing is doing.
>
> The feedback part sounds useful...
>
>> This includes a live display of test progress, down to a dynamic
>> view of the test log.
>
> .. although this part would be pretty much useless in our case and
> put quite a burden on the client's network bandwidth.

I don't see how you can say that more feedback is useful, then say it's
not. As for bandwidth, it is minimized in the BuildBot setup. The logs
are sent by the client to the server which stores them for history and
for display on the web page. Yes bandwidth requirements will be more
than what they are now but given the output-rate/build-time ratio it's
not going to be large. But of course actual usage will tell.

>> Some of the issues and how this step might help. And other changes
>> I've thought about might help..
>>
>> *Reproducibility*
>>
>> BuildBot controls what the checkouts are and what triggered the
>> build. You can see this by selecting one of the yellow "Build #"
>> links on the BuildBot display. If there's ever a question about
>> what code produced an error one can get the specific version of the
>> tree to attempt reproduction. The history of what tests have run
>> is kept. Which means that we can finally answer the dreaded "When
>> did that start breaking?" question.
>
> Change-driven testing is not the only way to get an answer to this
> question. But the huge point that is missing here is that how you
> define, determine, and display "breakage". No offense, but the
> BuiltBot logs at the few sites listed on the above page are in the
> prehistoric age comparing to what we have.

I don't claim they are better logs than the processed results. I only
claim one can have immediate access to them as compared to delayed
access to the processed results.

As for how it can tell us when things break. Since BuildBot keeps a
history of the logs, and includes version control information with the
test runs, one can look back through the logs to see at what CVS point a
test starts failing. It's not the change-driven testing, but the
change-marked testing that gives us that information.

>> *Scalability*
>
>> For this to happen some significant organizational changes need to
>> happen to the tests. As it currently stands such a division is not
>> possible because of the way tests are defined and organized. We
>> have a single test point, status/Jamfile, which a) points to the
>> rest of the test points, libs/*/test/Jamfile, and b) defines it's
>> own tests. The problem is that there is a conflict, for some
>> libraries, between the tests defined in status/Jamfile, and the
>> tests defined in libs/*/test/Jamfile. For example Boost.Config has
>> a reduced set of test in status/Jamfile.
>
>
> This is just a legacy. Many of these were fixed (tests moved to
> standalone jamfiles into the corresponding libraries' directories)
> during 1.32 preparations.

Yes, most, but not all, and in some cases for good reasons. Or at least
reasons important enough for the library authors.

>> And I understand that library authors would need to have a place to
>> run their own set of comprehensive tests.
>
> I'm not sure comprehensive vs. basic tests scheme is worth our time
> at the moment. We have much bigger issues to worry about.

I was discussing what I was currently doing to *solve* some of the
problems. Yes I know full well that there are other problems I'm not
currently working on. To put it simply, BuildBot is what I decided to
spend my time on this week. For various reasons.. 1) I think it will
help considerably in the management of testing resources, 2) I have a
personal need and hence can easily justify the unpaid time expenditure.

>> My proposal is to create a set of canonical tests that comprise the
>> regression testing suite independent of the library author's
>> tests. This set of tests would be structured so that it's possible
>> to run each libraries tests independent of others.
>>
>> It would be adding this type of structure:
>>
>> boost-root/tests/<library>/Jamfile
>> boost-root/tests/<library>/<sub-library>/Jamfile
>> boost-root/tests/<library>/<some-other-grouping>/Jamfile
>>
>> So for example a tester could say she only wants to test python/*,
>> or numeric/ublas/*, etc.
>
> I agree on the structuring and user specification parts. Not sure
> about the Jamfiles location -- it's a bit of pain to go to an
> unrelated directory to modify Jamfile when you add a new test case to
> your library's test suite.

I said "something like" :-) This would also work:

boost-root/libs/<library>/test/regression/Jamfile
boost-root/libs/<library>/test/regression/<sub-library>/Jamfile
boost-root/libs/<library>/test/regression/<some-other-grouping>/Jamfile

The point is to have _some_ organization and breakup of tests so that
they can be better distributed to the testing resources.

>> *Fragility*
>>
>> Simply put because BuildBot exposes the complete testing procedure
>> it becomes much easier to see where problems occur in the testing.
>
> Without specifics this statement is worth very little.

It's hard to make every statement jam packed with useful information :-)

> I for one don't see how it's going to solve any of the most hurting
> problems enumerated in Martin's post and the follow ups.

OK.. here are some that it could help alleviate..

Martin Wille wrote:
> Several testers have raised issues and plead for better communication
> several (probably many) times. Most of the time, we seem to get
> ignored, unfortunately. I don't want to accuse anyone of voluntarily
> neglecting our concerns. However, I think we apparently suffer from a
> "testing is not too well understood" problem at several levels.
[...]
> Fragility leads to the testing procedure breaking often and to
> breaking without getting noticed for some time and to breaking
> without anyone being able to recognize immediately exactly what part
> broke. This is a very unpleasant situation for anyone involved and it
> causes a significant level of frustration at least among those who
> run the tests (e.g. to see the own test results not being rendered
> for severals days or to see the test system being abused as a change
> announcement system isn't exactly motivating).
[...]
> - bugs in the testing procedure take too long to get fixed

>> Also, because there is a central server managing the procedures
>> it's more likely that problems can be solved at that one location
>> instead of opaquely trying to fix the clients.
>
> Again, that's just doesn't ring any bell to me. For one, a central
> server was in fact mentioned as one of the main sources of fragility.

A central results processing server was mentioned as a problem. It is
possible to have multiple test management servers.

> The pressing issues we have with the current state of affairs are
> very specific, and the above doesn't seem relevant.

For BuildBot it means that some of the management is done at the
servers, hence some test procedure problems can be solve without
affecting the clients. I know this is might not be a problem now, but as
the number of testers grow, which is one of the major issues we want to
solve, it becomes more important to have some form of central control.
Just think of the one famous distributed processing program, SETI_at_home.
The clients had the minimal install, and the servers did all the
management decisions for them. This should be our goal if we really want
to scale the testing.

--ok i'm rambling again--sorry.

>> With the, hopefully, additional distributed testers it becomes
>> harder for testing to break down completely.
>>
>> Another aspect we need to address is fragility of the server.
>
> Yes.
>
>> Ideally we would have multiple BuildBot servers to add redundancy.
>> To make such a multi-server, multi-client setup resource efficient
>> we would need to manage the distribution of testing between them.
>>
>> *Resources*
>>
>> *Response*
>>
>> The gain from segmentation and distribution of testing is hopefully
>> obvious ;-) But another advantage of using BuildBot is that we are
>> not tied to waiting for the XSLT processing to see results.
>
> First of all, "waiting" for the XSLT is not a problem anymore.

It's not a problem right now. But as the number of tests keep growing
it's likely going to be a problem again. And even now, there is still a
delay, even if it's shorter. BuildBot will minimize that wait, if you
are willing to use the logs themselves as feedback, to almost immediate.
As soon as changes go into CVS it is possible to go look at the buildbot
and see the clients getting your changes and testing them.

> Secondly, you are throwing baby out with the bathwater. What the
> BuiltBot page is going to display? The number of failingtests? The
> bjam logs? I'd like to see somebody to get a release out basing on
> something like this.

I wasn't throwing anything out. I never said I wanted to replace the
results processing. In fact I praise the results processing at one
point. IMPORTANT: This is not a replacement for the indispensable
results processing you do. It is an addition to help in one area we
currently have problems with, management of testing resources. No I
would not base a release on this.. Raw testing is not a substitute for
release management. It just helps.

>> Sure the results are not going to be as incredibly well organized
>> as the Meta-Comm results but they are immediately available.
>
> And, in the context of Boost, pretty much useless.

I disagree. If, as a library author, I can get an almost immediate
response to the question of whether my changes affected the
functionality of the library, it is extremely useful.

>> *Releases*
>>
>> Managing the testing for a release was brought up a many times. And
>> it's clear that requiring testers to do manual changes is just not
>> working. For BuildBot having the control of what is tested on the
>> server means that at any point one, or some small number,
>> person(s) can make the switch to have testing resources devoted to
>> release testing. One possibility that the finner grain testing
>> allows for is to limit the testing for a release to the required
>> set of toolsets and platforms.
>
>
> I agree that centralized control over what is getting tested by
> individual clients is desirable.
>
> Finally, thanks for posting more material to keep the discussion
> going.

Welcome.. Hopefully this reply is a bit clearer than the first. I always
have trouble with those long posts :-\

-- 
-- Grafik - Don't Assume Anything
-- Redshift Software, Inc. - http://redshift-software.com
-- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

Boost-testing list run by mbergal at meta-comm.com