Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2007-08-08 12:01:43


This part of my analysis focuses on the tools available for getting
feedback from the system about what's broken. Once again, because
there's been substantial effort invested in dart/cmake/ctest and
interest expressed by Kitware in supporting our use thereof, I'm
including that along with our current mechanisms. Although not
strictly a reporting system, I'll also discuss BuildBot a bit because
Rene has been doing some research on it and it has some feedback
features.

I've struggled to create a coherent organization to this post, but it
still rambles a little, for which I apologize in advance.

Feedback Systems
================

Boost's feedback system has evolved some unique and valuable features

Unique Boost Features
---------------------

* Automatic distinction of regressions from new failures.

* A markup system that allows us to distinguish library bugs from compiler
  bugs and add useful, detailed descriptions of severity and
  consequences. This feature will continue to be important at *least*
  as long as widely-used compilers are substantially nonconforming.

* Automatic distinction of tests that had been failing due to toolset
  limitiations and begin passing without a known explanation.

* A summary page that shows only unresolved issues.

* A separate view encoding failure information in a way most
  appropriate for users rather than library developers.

While I acknowledge that Boost's feedback system has substantial
weaknesses, no other feedback system I've seen accomodates most of
these features in any way.

Dart

----
It seems like Dart is a long, long way from being able to handle our
display needs -- it is really oriented towards providing binary "is
everything OK?" reports about the health of a project.  It would
actually be really useful for Boost to have such a binary view; it
would probably keep us much closer to the "no failures on the trunk
(or integration branch, if you prefer)" state that we hope to maintain
continuously.  However, I'm convinced our finer distinctions remain
extremely valuable as well.
Other problems with Dart's dashboards (see
http://public.kitware.com/dashboard.php?name=public):
* It is cryptic, rife with unexplained links and icons.  Even some of
  the Kitware guys didn't know what a few of them meant when asked.
* Just like most of Boost's regression pages, it doesn't deal well with
  large amounts of data.  One look at kitware's main dashboard above
  will show you a large amount of information, much of which is
  useless for at-a-glance assessment, and the continuous and
  experimental build results are all at the bottom of the page.
Dart's major strength is that it maintains a database of past build
results, so anyone can review the entire testing history.
BuildBot
--------
Buildbot is not really a feedback system; it's more a centralized
system for driving testing.  I will deal with that aspect of our
system in a separate message.
Buildbot's display result (see http://twistedmatrix.com/buildbot/ for
example) is no better suited to Boost's specific needs than Dart's,
but it does provide one useful feature not seen in either of the other
two systems: one can see, at any moment, what any of the test machines
are doing.  I know that's something Dart users want, and I certainly
want it.  In fact, as Rene has pointed out to me privately, the more
responsive we can make the system, the more useful it will be to
developers.  His fantasy, and now mine, is that we can show developers
the results of individual tests in real time.
Another great feature BuildBot has is an IRC plugin that insults the
developer who breaks the build
(http://buildbot.net/repos/release/docs/buildbot.html#IRC-Bot)
Apparently the person who fixes the build gets to choose the next
insult ;-)
Most importantly, BuildBot has a plugin architecture that would allow
us to (easily?) customize feedback actions
(http://buildbot.net/repos/release/docs/buildbot.html#Writing-New-Status-Plugins).
Boost's Systems
---------------
The major problems with our current feedback systems, AFAICT, are
fragility and poor user interface.
I probably don't need to make the case about fragility, but in case
there are any doubts, visit
http://engineering.meta-comm.com/boost-regression/CVS-HEAD/developer/index.build-index.html
For the past several days, it has shown a Python backtrace
  Traceback (most recent call last):
    File
    "D:\inetpub\wwwroots\engineering.meta-comm.com\boost-regression\handle_http.py",
    line 324, in ?
    ...
    File "C:\Python24\lib\zipfile.py", line 262, in _RealGetContents
      raise BadZipfile, "Bad magic number for central directory"
  BadZipfile: Bad magic number for central directory
This is a typical problem, and the system breaks for one reason or
another <subjective>on a seemingly weekly basis</subjective>.
With respect to the UI, although substantial effort has been invested
(for which we are all very grateful), managing that amount of
information is really hard, and we need to do better.  Some of the
current problems were described in this thread
<http://tinyurl.com/2w7xch> and <http://tinyurl.com/2n4usf>; here are
some others:
* The front page is essentially empty, showing little or no useful
  information
  <http://engineering.meta-comm.com/boost-regression/boost_1_34_1/developer/index.html>
* Summary tables have a redundant list of libraries at left (it also
  appears in a frame immediately adjacent)
* Summaries and individual library charts present way too much
  information to be callied "summaries", overwhelming any
  reasonably-sized browser pane.  We usually don't need a square for
  every test/platform combination
* It's hard to answer simple questions, like, "what is the status of
  Boost.Python under gcc-3.4?" or "how well does MPL work on windows
  with STLPort?", or what is the list of 
* A few links are cryptic (Full view/Release view) and could be better
  explained.
The email system that notifies developers when their libraries are
broken seems to be fairly reliable.  Its major weakness is that it
reports all failures (even those that aren't regressions) as
regressions, but that's a simple wording change.  Its second weakness
is that it has no way to harass the person who actually made the
code-breaking checkin, and harasses the maintainer of every broken
library just as aggressively, even if the breakage is due to one of
the library's dependencies.
Recommendations
---------------
Our web-based regression display system needs to be redesigned and
rewritten.  It was evolved from a state where we had far fewer
libraries, platforms, and testers, and is burdened with UI ideas that
only work in that smaller context.  I suggest we start with as minimal
a display as we think we can get away with: the front status reporting
page should be both useful and easily-grasped.
IMO the logical approach is to do this rewrite as a Trac plugin,
because of the obvious opportunities to integrate test reports with
other Trac functions (e.g. linking error messages to the source
browser, changeset views, etc.), because the Trac database can be used
to maintain the kind of history of test results that Dart manages, and
because Trac contains a nice builtin mechanism for
generating/displaying reports of all kinds.  In my conversations with
the Kitware guys, when we've discussed how Dart could accomodate
Boost's needs, I've repeatedly pushed them in the direction of
rebuilding Dart as a Trac plugin, but I don't think they "get it" yet.
I have some experience writing Trac plugins and would be willing to
contribute expertise and labor in this area.  However, I know that
we also need some serious web-UI design, and many other people are
much more skilled in that area than I am.  I don't want to waste my
own time doing badly what others could do well and more quickly, so
I'll need help.
Yes, I realize this raises questions about how test results will
actually be collected from testers; I'll try to deal with those in a
separate posting.
-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk