Boost :

Date view	Thread view	Subject view	Author view

From: Aleksey Gurtovoy (agurtovoy_at_[hidden])
Date: 2003-06-22 09:46:04

Next message: Peter Dimov: "Re: [boost] Experimental audience-targeted regression results"
Previous message: Paul A Bristow: "RE: [boost] Math constants - efficiency"
Maybe in reply to: Aleksey Gurtovoy: "[boost] Experimental audience-targeted regression results"
Next in thread: Peter Dimov: "Re: [boost] Experimental audience-targeted regression results"
Reply: Peter Dimov: "Re: [boost] Experimental audience-targeted regression results"

Peter Dimov wrote:
> Aleksey Gurtovoy wrote:
> > Peter Dimov wrote:
> >
> >> The summaries are nice, but the red "broken" thing on the user page
> >> may be too intimidating,
> >
> > When will show the actual status, it shouldn't be (it doesn't yet,
> > since some cooperation from library authors is needed - please see
> > below). Or, rather, if it still is, then it's the status that is too
> > intimidating, not the report :).
>
> I'm not sure I agree here. The problem is that the user summary says:
>
> Red status basically means that you won't be able to use
> the library.
>
> This is often simply not true.
>
> You do provide me with the tools to eliminate a false red status, but
> this is a "guilty unless proven otherwise" approach; pretty much
> everyone can easily run the regression tests on various broken
> configurations and it is up to me to hunt down every "non-critical"
> failure _and_ extract the relevant toolset name. In the meantime users
> have already rejected the library on the basis of that little red
> rectangle.

You have a point, here. Please note two things, though:

1) Everyone is free to run the regression tests on whatever
configurations they like - since they cannot upload the results, it
doesn't matter. For those who can and do uploads them, we definitely
need to agree on "approved" configurations and toolset names - and the
runner is responsible for making sure that things are not mis-configured.

2) _No matter_ in what format the regression results are reported, it's
simply wrong to send a user to a page reflecting the current CVS state.
Besides the fact that an overwhelming percent of users use, and want to
see the status of the released distributions only, showing them the main
trunk state is also misleading and harmful. While it shouldn't happen too
often, in practice the main trunk is badly broken at least once a week,
and looking at the pile of whatever-color-they-are "fail" cells against
the library tests carries the exact danger that you are worried about.

Having said that, I completely agree that we need to be careful not to
intimidate the user.

>
> Note also that Beman's intel-win32 toolset passed shared_ptr_test but
> your intel-win32 toolset did not, and I can't distinguish the two in
> expected_results.xml.

We just need to agree on the configuration, here. Currently, we run
Intel 7.1 in MSVC 6.0 compatibility mode, and Beman probably has his
configured for 7.0. I am not sure which configuration is more common
in the real world - assuming that this is the criterion we want to stick
to.

>
> In short, I think that this approach will result in more relaxed,
> "common denominator" tests, where any failure indeed "basically means
> that you won't be able to use the library". A failure in
> shared_ptr_test (but not in smart_ptr_test) _usually_ means that there
> are some obscure corner cases where this compiler/configuration is
> buggy. A failure in bind_test usually means that you may or may not be
> able to use bind depending on the complexity of the bind expression and
> the phase of the moon _but_ if it compiles it typically runs fine. ;-)
>
> I'm not sure how this can be avoided, but - a hypothetical example -
> if a library passes 29 out of its 30 tests, "BROKEN" simply doesn't
> seem appropriate. I'd also like to see some kind of per-test tagging
> ("brokenness weight") and not per-toolset tagging as the toolset name
> is unreliable.
>
> A way for the test executable to signal "only non-critical
> tests failed" may help, too.
>
> I realize I'm basically reiterating what you wrote in your earlier
> message... I guess the reason is that the system as it stands
> doesn't really accomplish its goals, or I may be misunderstanding how
> it is supposed to work.

Thanks for your thoughts. Here's our current understanding of the whole
thing:

1) Users should be presented with a report against the last official
distribution available for the download.

2) The users report should be conservative in terms of reporting whether
something works or not. Since there is no reliable way to say that for
sure, the most we can do is to provide the users with an easy way to
figure out that for themselves. Summary report is a must, of course.

3) It still makes sense to provide a daily user report against the
current CVS to be able to see how things are going to look as we are
getting closer to the next release.

Turning these into something real, here is our new user report against
the 1.30.0 distribution -
http://www.meta-comm.com/engineering/resources/boost_1_30_0/user_summary_pag
e.html

The one against the main trunk - with the expected failures automatically
extracted from the 1.30.0 results - is available here:
http://boost.sourceforge.net/regression-logs/user_summary_page.html

I would say we need to eliminate the reds in the latter before we can
release.

>
> The per-library developer summary is great, though. ;-)

Thank you!

Aleksey

Next message: Peter Dimov: "Re: [boost] Experimental audience-targeted regression results"
Previous message: Paul A Bristow: "RE: [boost] Math constants - efficiency"
Maybe in reply to: Aleksey Gurtovoy: "[boost] Experimental audience-targeted regression results"
Next in thread: Peter Dimov: "Re: [boost] Experimental audience-targeted regression results"
Reply: Peter Dimov: "Re: [boost] Experimental audience-targeted regression results"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk