Boost logo

Boost :

From: Aleksey Gurtovoy (agurtovoy_at_[hidden])
Date: 2005-03-09 04:40:08


[Please follow up on Boost.Testing list]

Martin Wille writes:
> Aleksey Gurtovoy wrote:
>> Martin Wille writes:
>
>>>The people involved in creating the test procedure have put very
>>>much effort in it and the resulting system does its job nicely when
>>>it happens to work correctly. However, apparently, the overall
>>>complexity of the testing procedure has grown above our management
>>>capabilities.
>> Honestly, I don't see from what you conclude that, less how it's
>> apparent. Having said that...
>
> - many reports of "something doesn't work right", often related to
> post-processing.

And in almost all cases "something doesn't work right" usually ended
up being a temporary breakage caused either by newly implemented
functionality in the regression tools' chain / internal environment
changes on our side, or malfunctioning directly related to inremental
runs/jam log parsing. The only thing the former cases indicate is that
the tools are being worked on and only _possibly_ that people doing
the work taking somewhat more risks at breaking things than, say,
during the release. In any case, this by no means indicates loss of
control -- quite the opposite. The latter cases, as we all agree,
_are_ tips of the seriously hurting issues that needs to be resolved
ASAP. Yet it's nothing new.

> Less than optimal responses on those.

Well, I disagree with this particular angle of looking at the
situation. Given the history of the recent issues which _I_ would
classify as suboptimally resolved/responded to, for me the above
statement is equivalent to saying: "Something didn't work right
recently and it seemed like it might as well be the problem be on the
reporting side -- I'd expect the corresponding maintainers to look at
it proactively and sort things out". Needless to say I don't consider
this to be neither fair nor productive way of looking at things.

> We all do
> understand that you and Misha are under time constraints and
> therefor aren't able to answer immediately. Having only two people
> who are able to fix these things is one small part of our
> problems. The fact that people do not know who would be responsible
> for finding out what part of the testing procedure is going wrong
> seems to indicate a management problem.

IMO the problem is not that people don't know who is responsible (in
fact, assigning a single person to be responsible is going to bring us
back to square one) but rather that nobody steps up and says
"I'll research this and report back" -- in a timely manner, that is.
Is it a management problem? Rather lack of resources, I think.

>
> - bugs suddenly go away and people involved in tracking them down do
> not understand what was causing them. This kind of problem is
> probably related to the build system. I consider this one fairly
> dangerous, actually.

Same here. Yet again, we've been having these problems from the day
one. If your point is that it's time to solve them, I agree 100%.

>
> - We're not really able to tell when a bug started to get reported.

I'm not sure I understand this one. Could you please provide an
example?

>>>I'll make a start, I hope others will contribute to the list.
>>>Issues and causes unordered (please, excuse any duplicates):
>> I'll comment on the ones I have something to say about.
>>
>>>- testing takes a huge amount of resources (HD, CPU, RAM, people
>>> operating the test systems, people operating the result rendering
>>> systems, people coding the test post processing tools, people
>>> finding the bugs in the testing system)
>> True. It's also a very general observation. Don't see how having it
>> here helps us.
>
> I'm under the impression some people did not know how much resources
> testing actually costs. I've seen reactions of surprise when I
> mentioned the CPU time, HD space or RAM consumed by the tests. Pleas
> for splitting test cases were ignored (e.g. random_test).

OK.

>
>>>- the testing procedure is complex
>> Internally, yes. The main complexity and _the_ source of fragility
>> lies in "bjam results to XML" stage of processing. I'd say it's one of
>> the top 10 issues by solving which we can substantially simplify
>> everybody's life.
>
> I agree. This processing step has to deal with the build system (which
> in complex itself) and with different compiler output. Other
> complexity probably stems from having to collect and to display test
> results that reflect different cvs checkout times.

It it really a problem nowdays? I think we have timestamps in every
possible place and they make things pretty obvious.

>
>>>- the code-change to result-rendering process takes too long
>> Not anymore. In any case, there is nothing in the used technology
>> (XSLT) that would make this an inherent bottleneck. It became one
>> because the original implementation of the reporting tools just
>> wasn't written for the volume of the processed data the tools are
>> asked to handle nowdays.
>
> *This* step might be a lot fast now (congrats, this is a *big*
> improvement). However, there still are other factors which make the
> code-change to result rendering process take too long.

I think the asnwer to this is further splitting of work among
distributed machines.

>
>>>- bugs in the testing procedure take too long to get fixed
>> I think all I can say on this one is said here --
>> http://article.gmane.org/gmane.comp.lib.boost.devel/119341.
>
> I'm not trying to imply Misha or you wouldn't do enough. However,
> the fact that only two people have the knowledge and the access to
> the result collection stage of the testing process is a problem in
> itself.

It is. Anybody who feels interested enough to be filled in on this is
more than welcome to join.

[...]

>>>- lousy performance of Sourceforge
>>>- resource limitations at Sourceforge (e.g. the number of files there)
>> This doesn't hurt us anymore, does it?
>
> It hurts everytime the result collecting stage doesn't work correctly.
> We're not able to generate our own XML results and to upload them due
> to the SF resource limits.

I'd say we just need a backup results-processing site.

>>>- becoming a new contributor for testing resources is too difficult.
>> I don't think it's true anymore. How simplier it can become --
>> http://www.meta-comm.com/engineering/regression_setup/instructions.html?
>
> Hmm, recent traffic on the testing reflector seemed to indicate it
> isn't too simple. This might be caused by problems with the build
> system.

If you are talking about CodeWarrior on OS X saga, then it is more
build system-related than anything else.

[...]

>>>- test post processing has to work on output from different
>>> compilers. Naturally, that output is formatted differently.
>> What's the problem here?
>
> It isn't a problem? We don't parse the output from the compilers?

Oh, I thought you were referring to something else. Yes, as we've
agreed before, the need to post-process the output is probably the
biggest source of problems.

>
>>>- several times the post processing broke due to problems with the
>>> XSLT processor.
>> And twice as often it broke due to somebody's erroneous checkin. The
>> latter is IMO much more important to account for and handle to
>> gracefully. Most of XSLT-related problems of the past were caused by
>> inadequate usage, such as transformation algorithms not prepared for a
>> huge volume of data we are now processing.
>
> Do you expect the recent updates to be able to handle a significantly
> higher volume?

Yes, and we have implemented only the most obvious optimizations. If
there is further need to speed up things, we'll speed them up.

>>> additional resources and requires manual intervention.
>> What do you think of this one --
>> http://article.gmane.org/gmane.comp.lib.boost.devel/119337?
>
> I'm with Victor on this point; for the testers (and hopefully there'll
> be more of them one day) it's significantly easier not to have to
> change anything during the release preparations. This could be
> achieved by using the CVS trunk as release branch until the actual
> release gets tagged.

What about the tarballs, though?

>
> I hoped other people would contribute to the list; I'm sure there's a
> lot more to say about testing. E.g. it would be nice to have some sort
> of history of recent regression results.

Its' on our TODO list --
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Boost.Testing.

> It would be nice to be able a to split the runs vertically (running
> tests for a smaller set of toolsets)

Aren't this possible now?

> and horizontally (running tests for a smaller set of
> libraries) easily;

Agreed.

> I realize, though, that presenting the results would become more
> difficult.

Nothing we can't figure out.

-- 
Aleksey Gurtovoy
MetaCommunications Engineering

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk