From: Aleksey Gurtovoy (agurtovoy_at_[hidden])
Date: 2003-05-30 06:15:25
Beman Dawes wrote:
> One possible short-term fix might be to run the MPL tests separately,
> and post them as a separate table.
That's what we plan to do, although format of the table probably going
to be different - please see below.
> Long term, some kind of hierarchical approach might help with the
> reporting side of the equation. Perhaps with an intermediate web page
> that collapses all a library's tests down to one line. Rene's
> summary page shows that a relatively small organization effort can
> make reporting much more accessible.
> Ideas appreciated.
IMO it's worth to step back and try to answer a couple of "big picture"
1) What are the target audiences for the regression test results?
2) What kind of information these audiences are looking to find in
Answering these is actually simple - first, as usual, there are two
primary audiences here - boost users and boost developers. Now, IMO,
when going to the regression results page, these two groups are looking
for the answers on quite different questions.
I would argue that for a user _the_ dominant motive to study the
regression results is to find out whether a particular library works on
a particular compiler(s) she uses. What's tricky about it is that often
"works" doesn't necessarily equals to "all tests are passing". It's
quite common for a library to fail a few corner-case tests, tests of
seldom-used functionality, or advanced functionality that demand a high
standard conformance, and yet in practice be perfectly usable with many
of those compilers. As a matter of fact, if you analyze the current
compiler status table for, let's say MSVC 6.5, you will see that _most_
of the boost libraries fall into this "works with caveats" category.
Well, except that if you are indeed a user, you don't know that because
when looking at the particular tests' failures you have no idea whether
these are show stoppers or not, and if not, what do the failures
_mean_, in user-level terms.
And then of course, even if the tests did provide you with that
information, you don't want to browse through several pages of them
collecting everything together to make the conclusion. IMO, ultimately,
the perfect user report should be a table filled with nice green and
red cells on library/compiler crossings which simply said "works",
"doesn't work", or "works with caveats" and linked to a more detailed
report, still in the user-level terms (something like "pass", "fail,
feature X is not available", or "fail, show stopper" for every
standalone library test).
Now, that was a user position. If you wearing the developer's hat,
though, you are not really interested whether a particular library
works on a certain compiler; or, rather, you already know it because
you are the library author. Instead, here, the primary reason to check
regressions results is to make sure that none of the recent changes in
the CVS lead to, well, regression in the library's functionality (or
better yet, to be automatically notified if this happens). The failures
that are known are not nearly as interesting to you as a change in the
failures pattern. And just like the user, you don't want to gather this
information from pieces. Basically, you want the same field of
green/red cells as a user, one row per library, only in this case green
would mean "nothing's broken" (comparing to expected failures picture),
and red would mean "somebody broke something". Ideally, red on the
developers' report would be a quite rare event.
Well, anyway, that was my take. I haven't thought through how exactly
something like this can be implemented so it's both easy to setup and
maintain (per library, especially the user-oriented report), and yet
indeed satisfactorily informative. We plan to use MPL as a test-bed for
these ideas to see how things work out.
Any comments are welcome :).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk