Boost logo

Boost :

From: Andy Stevenson (andystevenson_at_[hidden])
Date: 2007-08-08 13:45:54


On 8 Aug 2007, at 17:01, David Abrahams wrote:

>
> This part of my analysis focuses on the tools available for getting
> feedback from the system about what's broken. Once again, because
> there's been substantial effort invested in dart/cmake/ctest and
> interest expressed by Kitware in supporting our use thereof, I'm
> including that along with our current mechanisms. Although not
> strictly a reporting system, I'll also discuss BuildBot a bit because
> Rene has been doing some research on it and it has some feedback
> features.
>
> I've struggled to create a coherent organization to this post, but it
> still rambles a little, for which I apologize in advance.
>
> Feedback Systems
> ================
>
> Boost's feedback system has evolved some unique and valuable features
>
> Unique Boost Features
> ---------------------
>
> * Automatic distinction of regressions from new failures.
>
> * A markup system that allows us to distinguish library bugs from
> compiler
> bugs and add useful, detailed descriptions of severity and
> consequences. This feature will continue to be important at *least*
> as long as widely-used compilers are substantially nonconforming.
>
> * Automatic distinction of tests that had been failing due to toolset
> limitiations and begin passing without a known explanation.
>
> * A summary page that shows only unresolved issues.
>
> * A separate view encoding failure information in a way most
> appropriate for users rather than library developers.
>
> While I acknowledge that Boost's feedback system has substantial
> weaknesses, no other feedback system I've seen accomodates most of
> these features in any way.
>
> I agree. I've had numerous experiences with large projects that
have not done it as well as boost. Personally I find the status
information held by meta-comm to be useful and informative. The
opening page isn't very useful but digging in always leads to the
information that is most useful.
> Dart
> ----
>
> It seems like Dart is a long, long way from being able to handle our
> display needs -- it is really oriented towards providing binary "is
> everything OK?" reports about the health of a project. It would
> actually be really useful for Boost to have such a binary view; it
> would probably keep us much closer to the "no failures on the trunk
> (or integration branch, if you prefer)" state that we hope to maintain
> continuously. However, I'm convinced our finer distinctions remain
> extremely valuable as well.
>
> Other problems with Dart's dashboards (see
> http://public.kitware.com/dashboard.php?name=public):
>
> * It is cryptic, rife with unexplained links and icons. Even some of
> the Kitware guys didn't know what a few of them meant when asked.
>
> * Just like most of Boost's regression pages, it doesn't deal well
> with
> large amounts of data. One look at kitware's main dashboard above
> will show you a large amount of information, much of which is
> useless for at-a-glance assessment, and the continuous and
> experimental build results are all at the bottom of the page.
>
> Dart's major strength is that it maintains a database of past build
> results, so anyone can review the entire testing history.
>
> BuildBot
> --------
>
> Buildbot is not really a feedback system; it's more a centralized
> system for driving testing. I will deal with that aspect of our
> system in a separate message.
>
> Buildbot's display result (see http://twistedmatrix.com/buildbot/ for
> example) is no better suited to Boost's specific needs than Dart's,
> but it does provide one useful feature not seen in either of the other
> two systems: one can see, at any moment, what any of the test machines
> are doing. I know that's something Dart users want, and I certainly
> want it. In fact, as Rene has pointed out to me privately, the more
> responsive we can make the system, the more useful it will be to
> developers. His fantasy, and now mine, is that we can show developers
> the results of individual tests in real time.
>
> Another great feature BuildBot has is an IRC plugin that insults the
> developer who breaks the build
> (http://buildbot.net/repos/release/docs/buildbot.html#IRC-Bot)
> Apparently the person who fixes the build gets to choose the next
> insult ;-)
>
> Most importantly, BuildBot has a plugin architecture that would allow
> us to (easily?) customize feedback actions
> (http://buildbot.net/repos/release/docs/buildbot.html#Writing-New-
> Status-Plugins).
>
> Boost's Systems
> ---------------
>
> The major problems with our current feedback systems, AFAICT, are
> fragility and poor user interface.
>
> I probably don't need to make the case about fragility, but in case
> there are any doubts, visit
> http://engineering.meta-comm.com/boost-regression/CVS-HEAD/
> developer/index.build-index.html
> For the past several days, it has shown a Python backtrace
>
> Traceback (most recent call last):
> File
> "D:\inetpub\wwwroots\engineering.meta-comm.com\boost-regression
> \handle_http.py",
> line 324, in ?
> ...
> File "C:\Python24\lib\zipfile.py", line 262, in _RealGetContents
> raise BadZipfile, "Bad magic number for central directory"
> BadZipfile: Bad magic number for central directory
>
> This is a typical problem, and the system breaks for one reason or
> another <subjective>on a seemingly weekly basis</subjective>.
>
> With respect to the UI, although substantial effort has been invested
> (for which we are all very grateful), managing that amount of
> information is really hard, and we need to do better. Some of the
> current problems were described in this thread
> <http://tinyurl.com/2w7xch> and <http://tinyurl.com/2n4usf>; here are
> some others:
>
> * The front page is essentially empty, showing little or no useful
> information
> <http://engineering.meta-comm.com/boost-regression/boost_1_34_1/
> developer/index.html>
>
> * Summary tables have a redundant list of libraries at left (it also
> appears in a frame immediately adjacent)
>
> * Summaries and individual library charts present way too much
> information to be callied "summaries", overwhelming any
> reasonably-sized browser pane. We usually don't need a square for
> every test/platform combination
>
> * It's hard to answer simple questions, like, "what is the status of
> Boost.Python under gcc-3.4?" or "how well does MPL work on windows
> with STLPort?", or what is the list of
>
> * A few links are cryptic (Full view/Release view) and could be better
> explained.
>
> The email system that notifies developers when their libraries are
> broken seems to be fairly reliable. Its major weakness is that it
> reports all failures (even those that aren't regressions) as
> regressions, but that's a simple wording change. Its second weakness
> is that it has no way to harass the person who actually made the
> code-breaking checkin, and harasses the maintainer of every broken
> library just as aggressively, even if the breakage is due to one of
> the library's dependencies.
>
> Recommendations
> ---------------
>
> Our web-based regression display system needs to be redesigned and
> rewritten. It was evolved from a state where we had far fewer
> libraries, platforms, and testers, and is burdened with UI ideas that
> only work in that smaller context. I suggest we start with as minimal
> a display as we think we can get away with: the front status reporting
> page should be both useful and easily-grasped.
>
> IMO the logical approach is to do this rewrite as a Trac plugin,
> because of the obvious opportunities to integrate test reports with
> other Trac functions (e.g. linking error messages to the source
> browser, changeset views, etc.), because the Trac database can be used
> to maintain the kind of history of test results that Dart manages, and
> because Trac contains a nice builtin mechanism for
> generating/displaying reports of all kinds. In my conversations with
> the Kitware guys, when we've discussed how Dart could accomodate
> Boost's needs, I've repeatedly pushed them in the direction of
> rebuilding Dart as a Trac plugin, but I don't think they "get it" yet.
>
> I have some experience writing Trac plugins and would be willing to
> contribute expertise and labor in this area. However, I know that
> we also need some serious web-UI design, and many other people are
> much more skilled in that area than I am. I don't want to waste my
> own time doing badly what others could do well and more quickly, so
> I'll need help.
>
> Yes, I realize this raises questions about how test results will
> actually be collected from testers; I'll try to deal with those in a
> separate posting.
>
Generally I agree with all the recommendations. However I am a big
fan of incremental delivery and I would advocate boost approach this
systemically. You don't want to get into the tool business. (.. avoid
the anecdotal 'why fix things in 5 minutes when I can take a year
writing a tool to automate it! :-{)

For what it is worth my advice would be to do the following;

1. Choose 2/3 representative tool-chains/platforms as boost
'reference models' (msvc-M.N, win XP X.Y...) (gcc-N.M, debian...)
(gcc-N.M, MacOSX,...)
- the choices are based on what's right 'for the masses' and what is
the defacto platform for mainstream development on those platforms
(before anyone screams I am seriously NOT advocating dropping the
builds on the other platforms - read on)
- whatever the choices end up being I believe 'boost' needs to make a
clear policy decision.

2. These 'reference models' are the basis of summary reports at the
top level against the 'stable' released libraries. That can go on a
page and it should take a minor amount of time to generate
incrementally from the existing system.

3. As for tracking individual test results I don't personally see
what's wrong with putting these under subversion. Given the
likelyhood of high commonality between the output text of successive
runs I think it is a much better 'implementation choice' than
strictly a database. Certainly XML output from the test framework
would aid other post-processing - but can be a secondary step/
enhancement to Boost.Test? Also there is a strong correlation between
the versioning of test results and the changes since the last run
that changed the results. Some relatively trivial automation of the
source dependency tree changes between successive runs of individual
tests could be a significant aid for the authors/maintainers. I'm not
an expert on bjam but I presume for an individual target it would not
be difficult to run a diff between the sources in successive
invocations in each test.

4. Given the reference models above it would then be sensible to show
the status of successive tiers of the boost project. ie. stable,
development, sandbox, ... Again an indirection at the top-level will
make this accessible.

5. Beyond this I would split out the summaries into platform variants
on individual pages 'boost on windows', 'boost on linux' etc. In this
way no information is lost and the community of developers is taken
care of.

Hope this helps. As things scale there is a stronger need for
'standardization' its unavoidable. Tool-chains are rarely the silver
bullet. What boost has shouldn't be neglected ... it is already good
for reporting status and its failings can be worked on incrementally.

Andy

> --
> Dave Abrahams
> Boost Consulting
> http://www.boost-consulting.com
>
> The Astoria Seminar ==> http://www.astoriaseminar.com
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/
> listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk