Boost logo

Boost :

From: Misha Bergal (mbergal_at_[hidden])
Date: 2003-06-19 18:08:38


My comments to this constructive mail are below:

Peter Dimov wrote:
> Aleksey Gurtovoy wrote:
>> Peter Dimov wrote:
>>
> You do provide me with the tools to eliminate a false red
> status, but this is a "guilty unless proven otherwise"
> approach;

Accurate description. This is what is going to happen every *everybody* will
run tests and upload them to boost.

> pretty much everyone can easily run the regression
> tests on various broken configurations
Yes, they can run standard boost regression tests and see what passes and
what fails.
But they will probably use it for the internal purposes and should *not* be
allowed to upload them as an "official" Boost results.

> and it is up to me to
> hunt down every "non-critical" failure _and_ extract the
> relevant toolset name.
What he probably means is that he is afraid of having to "hunt down"
non-critical failures in a lot of setups he has no control, clue or interest
in.

> In the meantime users have already
> rejected the library on the basis of that little red rectangle.
This is why it is important to have "user" results everybody can trust.
That's what we are fighting for.

> Note also that Beman's intel-win32 toolset passed
> shared_ptr_test but your intel-win32 toolset did not, and I
> can't distinguish the two in expected_results.xml.
Very good point to support the point above. Status report providers will
have to work very closely with library authors to provide accurate reports.
By accurate I mean:

1. Consistent - user can reproduce the result
2. Meaningful - library author marked-up the significant and non-significant
test cases.

> In short, I think that this approach will result in more
> relaxed, "common denominator" tests, where any failure indeed
> "basically means that you won't be able to use the library".
> A failure in shared_ptr_test (but not in
> smart_ptr_test) _usually_ means that there are some obscure
> corner cases where this compiler/configuration is buggy.
I don't see how it is going to come to that.

> A
> failure in bind_test usually means that you may or may not be
> able to use bind depending on the complexity of the bind
> expression and the phase of the moon _but_ if it compiles it
> typically runs fine. ;-)

>
> I'm not sure how this can be avoided, but - a hypothetical
> example - if a library passes 29 out of its 30 tests.
> "BROKEN" simply doesn't seem appropriate.
Broken means the unexpected failure. We provide the configuration authors
are confortable with.
They specify the results for this configurations. The status tables are
produced on 2 different sites, so if one site screwes up their environment
or doesn't follow product installation instructions right - it can be found
on rather quickly.

> I'd also like to
> see some kind of per-test tagging ("brokenness
> weight") and not per-toolset tagging as the toolset name is
> unreliable.
2 good points:

1. We track tests per fail, not per test-case
2. Toolset names are unreliable - it is not a problem if we can standardize
on toolsets used for producing the boost status tables on different
platforms. Basically the status table providers on the platform (minimum 2)
standardize on the configurations and toolset naming.
If somebody else is using different toolsets - she will have to run the
tests by herself.

Misha Bergal
MetaCommunications Engineering


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk