From: David Abrahams (dave_at_[hidden])
Date: 2007-08-08 12:01:43
This part of my analysis focuses on the tools available for getting
feedback from the system about what's broken. Once again, because
there's been substantial effort invested in dart/cmake/ctest and
interest expressed by Kitware in supporting our use thereof, I'm
including that along with our current mechanisms. Although not
strictly a reporting system, I'll also discuss BuildBot a bit because
Rene has been doing some research on it and it has some feedback
I've struggled to create a coherent organization to this post, but it
still rambles a little, for which I apologize in advance.
Boost's feedback system has evolved some unique and valuable features
Unique Boost Features
* Automatic distinction of regressions from new failures.
* A markup system that allows us to distinguish library bugs from compiler
bugs and add useful, detailed descriptions of severity and
consequences. This feature will continue to be important at *least*
as long as widely-used compilers are substantially nonconforming.
* Automatic distinction of tests that had been failing due to toolset
limitiations and begin passing without a known explanation.
* A summary page that shows only unresolved issues.
* A separate view encoding failure information in a way most
appropriate for users rather than library developers.
While I acknowledge that Boost's feedback system has substantial
weaknesses, no other feedback system I've seen accomodates most of
these features in any way.
---- It seems like Dart is a long, long way from being able to handle our display needs -- it is really oriented towards providing binary "is everything OK?" reports about the health of a project. It would actually be really useful for Boost to have such a binary view; it would probably keep us much closer to the "no failures on the trunk (or integration branch, if you prefer)" state that we hope to maintain continuously. However, I'm convinced our finer distinctions remain extremely valuable as well. Other problems with Dart's dashboards (see http://public.kitware.com/dashboard.php?name=public): * It is cryptic, rife with unexplained links and icons. Even some of the Kitware guys didn't know what a few of them meant when asked. * Just like most of Boost's regression pages, it doesn't deal well with large amounts of data. One look at kitware's main dashboard above will show you a large amount of information, much of which is useless for at-a-glance assessment, and the continuous and experimental build results are all at the bottom of the page. Dart's major strength is that it maintains a database of past build results, so anyone can review the entire testing history. BuildBot -------- Buildbot is not really a feedback system; it's more a centralized system for driving testing. I will deal with that aspect of our system in a separate message. Buildbot's display result (see http://twistedmatrix.com/buildbot/ for example) is no better suited to Boost's specific needs than Dart's, but it does provide one useful feature not seen in either of the other two systems: one can see, at any moment, what any of the test machines are doing. I know that's something Dart users want, and I certainly want it. In fact, as Rene has pointed out to me privately, the more responsive we can make the system, the more useful it will be to developers. His fantasy, and now mine, is that we can show developers the results of individual tests in real time. Another great feature BuildBot has is an IRC plugin that insults the developer who breaks the build (http://buildbot.net/repos/release/docs/buildbot.html#IRC-Bot) Apparently the person who fixes the build gets to choose the next insult ;-) Most importantly, BuildBot has a plugin architecture that would allow us to (easily?) customize feedback actions (http://buildbot.net/repos/release/docs/buildbot.html#Writing-New-Status-Plugins). Boost's Systems --------------- The major problems with our current feedback systems, AFAICT, are fragility and poor user interface. I probably don't need to make the case about fragility, but in case there are any doubts, visit http://engineering.meta-comm.com/boost-regression/CVS-HEAD/developer/index.build-index.html For the past several days, it has shown a Python backtrace Traceback (most recent call last): File "D:\inetpub\wwwroots\engineering.meta-comm.com\boost-regression\handle_http.py", line 324, in ? ... File "C:\Python24\lib\zipfile.py", line 262, in _RealGetContents raise BadZipfile, "Bad magic number for central directory" BadZipfile: Bad magic number for central directory This is a typical problem, and the system breaks for one reason or another <subjective>on a seemingly weekly basis</subjective>. With respect to the UI, although substantial effort has been invested (for which we are all very grateful), managing that amount of information is really hard, and we need to do better. Some of the current problems were described in this thread <http://tinyurl.com/2w7xch> and <http://tinyurl.com/2n4usf>; here are some others: * The front page is essentially empty, showing little or no useful information <http://engineering.meta-comm.com/boost-regression/boost_1_34_1/developer/index.html> * Summary tables have a redundant list of libraries at left (it also appears in a frame immediately adjacent) * Summaries and individual library charts present way too much information to be callied "summaries", overwhelming any reasonably-sized browser pane. We usually don't need a square for every test/platform combination * It's hard to answer simple questions, like, "what is the status of Boost.Python under gcc-3.4?" or "how well does MPL work on windows with STLPort?", or what is the list of * A few links are cryptic (Full view/Release view) and could be better explained. The email system that notifies developers when their libraries are broken seems to be fairly reliable. Its major weakness is that it reports all failures (even those that aren't regressions) as regressions, but that's a simple wording change. Its second weakness is that it has no way to harass the person who actually made the code-breaking checkin, and harasses the maintainer of every broken library just as aggressively, even if the breakage is due to one of the library's dependencies. Recommendations --------------- Our web-based regression display system needs to be redesigned and rewritten. It was evolved from a state where we had far fewer libraries, platforms, and testers, and is burdened with UI ideas that only work in that smaller context. I suggest we start with as minimal a display as we think we can get away with: the front status reporting page should be both useful and easily-grasped. IMO the logical approach is to do this rewrite as a Trac plugin, because of the obvious opportunities to integrate test reports with other Trac functions (e.g. linking error messages to the source browser, changeset views, etc.), because the Trac database can be used to maintain the kind of history of test results that Dart manages, and because Trac contains a nice builtin mechanism for generating/displaying reports of all kinds. In my conversations with the Kitware guys, when we've discussed how Dart could accomodate Boost's needs, I've repeatedly pushed them in the direction of rebuilding Dart as a Trac plugin, but I don't think they "get it" yet. I have some experience writing Trac plugins and would be willing to contribute expertise and labor in this area. However, I know that we also need some serious web-UI design, and many other people are much more skilled in that area than I am. I don't want to waste my own time doing badly what others could do well and more quickly, so I'll need help. Yes, I realize this raises questions about how test results will actually be collected from testers; I'll try to deal with those in a separate posting. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk