Boost logo

Boost Testing :

From: Martin Wille (mw8329_at_[hidden])
Date: 2005-03-09 11:17:20


Aleksey Gurtovoy wrote:
> [Please follow up on Boost.Testing list]
>
> Martin Wille writes:
>
>>Aleksey Gurtovoy wrote:
>>
>>>Martin Wille writes:
>>
>>>>The people involved in creating the test procedure have put very
>>>>much effort in it and the resulting system does its job nicely when
>>>>it happens to work correctly. However, apparently, the overall
>>>>complexity of the testing procedure has grown above our management
>>>>capabilities.
>>>
>>>Honestly, I don't see from what you conclude that, less how it's
>>>apparent. Having said that...
>>
>>- many reports of "something doesn't work right", often related to
>> post-processing.
>
>
> And in almost all cases "something doesn't work right" usually ended
> up being a temporary breakage caused either by newly implemented
> functionality in the regression tools' chain / internal environment
> changes on our side, or malfunctioning directly related to inremental
> runs/jam log parsing. The only thing the former cases indicate is that
> the tools are being worked on and only _possibly_ that people doing
> the work taking somewhat more risks at breaking things than, say,
> during the release. In any case, this by no means indicates loss of
> control -- quite the opposite. The latter cases, as we all agree,
> _are_ tips of the seriously hurting issues that needs to be resolved
> ASAP. Yet it's nothing new.
>
>
>> Less than optimal responses on those.
>
>
> Well, I disagree with this particular angle of looking at the
> situation. Given the history of the recent issues which _I_ would
> classify as suboptimally resolved/responded to, for me the above
> statement is equivalent to saying: "Something didn't work right
> recently and it seemed like it might as well be the problem be on the
> reporting side -- I'd expect the corresponding maintainers to look at
> it proactively and sort things out". Needless to say I don't consider
> this to be neither fair nor productive way of looking at things.

It's no equivalent. If the only two persons who would be able to solve
the problem aren't available for some time then this results in
suboptimal responses. Having more than one post-processing site, as you
suggested, would help much (assuming we're able to find people who will
be able to help in case of post-processing problems).

>> We all do
>> understand that you and Misha are under time constraints and
>> therefor aren't able to answer immediately. Having only two people
>> who are able to fix these things is one small part of our
>> problems. The fact that people do not know who would be responsible
>> for finding out what part of the testing procedure is going wrong
>> seems to indicate a management problem.
>
>
> IMO the problem is not that people don't know who is responsible (in
> fact, assigning a single person to be responsible is going to bring us
> back to square one) but rather that nobody steps up and says
> "I'll research this and report back" -- in a timely manner, that is.
> Is it a management problem? Rather lack of resources, I think.
>
>
>>- bugs suddenly go away and people involved in tracking them down do
>> not understand what was causing them. This kind of problem is
>> probably related to the build system. I consider this one fairly
>> dangerous, actually.
>
>
> Same here. Yet again, we've been having these problems from the day
> one. If your point is that it's time to solve them, I agree 100%.

Frankly, I don't care whether we had these problems at day one. I just
tried to summarize our current problems, regardless of history and
without any thought of blaming anyone for the problems. I do understand
that our current testing procedure is an improvement over what we had
before.

I like Doug's suggestion of checking out the CVS state of a fixed time
of day. It's easy to implement and it offers at least some improvement.
It's drawback is that the code-change/test-run cycle would take longer.

>>- We're not really able to tell when a bug started to get reported.
>
>
> I'm not sure I understand this one. Could you please provide an
> example?

We don't have a history of older results. So the situation is: some
developer finds time to look at potential regressions. He sees some red
cells. He's unable to tell whether the regressions are directly related
to code he checked in a week ago, because he didn't have time to looks
at the test results in the meantime.

I think simply archiving the summary page regularly (perhaps one or two
times per day) would be an improvement. It's easy to implement and it
doesn't cost much storage space (I'm not talking about archiving the
whole set of results). In case some history information is needed one
could do a binary search in the archive.

Another idea which probably would also be easy to implement would be to
archive the summary in XML format. This way, the data would be more
suitably formatted for automatized processing.

[...]

>>>>- the code-change to result-rendering process takes too long
>>>
>>>Not anymore. In any case, there is nothing in the used technology
>>>(XSLT) that would make this an inherent bottleneck. It became one
>>>because the original implementation of the reporting tools just
>>>wasn't written for the volume of the processed data the tools are
>>>asked to handle nowdays.
>>
>>*This* step might be a lot fast now (congrats, this is a *big*
>> improvement). However, there still are other factors which make the
>> code-change to result rendering process take too long.
>
>
> I think the asnwer to this is further splitting of work among
> distributed machines.

If the tests weren't depending on machine specific things then a
SETI_at_home-like approach would be great. However, I don't see how we
could get there. Unless we find a bunch of virtually identical machines,
we'll hit a limit of achievable granularity. (We could consider using
emulators for machines for running the tests. This would probably have
some performance drawbacks. We would have to maintain the virtual
machines. I doubt the gain would justify the effort put into that.)

>>>>- lousy performance of Sourceforge
>>>>- resource limitations at Sourceforge (e.g. the number of files there)
>>>
>>>This doesn't hurt us anymore, does it?
>>
>>It hurts everytime the result collecting stage doesn't work correctly.
>>We're not able to generate our own XML results and to upload them due
>>to the SF resource limits.
>
>
> I'd say we just need a backup results-processing site.

That would help a lot, indeed.

>>>>- becoming a new contributor for testing resources is too difficult.
>>>
>>>I don't think it's true anymore. How simplier it can become --
>>>http://www.meta-comm.com/engineering/regression_setup/instructions.html?
>>
>>Hmm, recent traffic on the testing reflector seemed to indicate it
>>isn't too simple. This might be caused by problems with the build
>>system.
>
>
> If you are talking about CodeWarrior on OS X saga, then it is more
> build system-related than anything else.

The build system is a part of the testing procedure.

[...]
>>Do you expect the recent updates to be able to handle a significantly
>>higher volume?
>
>
> Yes, and we have implemented only the most obvious optimizations. If
> there is further need to speed up things, we'll speed them up.

Cool!

>>>> additional resources and requires manual intervention.
>>>
>>>What do you think of this one --
>>>http://article.gmane.org/gmane.comp.lib.boost.devel/119337?
>>
>>I'm with Victor on this point; for the testers (and hopefully there'll
>>be more of them one day) it's significantly easier not to have to
>>change anything during the release preparations. This could be
>>achieved by using the CVS trunk as release branch until the actual
>>release gets tagged.
>
>
> What about the tarballs, though?

I think a system with temporary preferences would be fragile.

I think, for testing the tarballs, we'd ideally have more volunteers
than between the releases. If the tarballs contained a simple script
that would be enough to run the tests then we'd find more people for
testing and we'd simplify the tarball-testing for the regular testers,
too. (They'd just have to prepare sufficient disk space and to run the
script.) We're probably not able to produce such a script for all
possible target systems. However, it should be doable for the major
platforms.

>>It would be nice to be able a to split the runs vertically (running
>>tests for a smaller set of toolsets)
>
>
> Aren't this possible now?

Yes, however, I think I'd currently have to use different runner-ids in
order to get things displayed correctly.

This is probably easy to fix (and I think you already told me it is easy).

Regards,
m


Boost-testing list run by mbergal at meta-comm.com