Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)
From: Raffi Enficiaud (raffi.enficiaud_at_[hidden])
Date: 2015-10-09 13:54:15

Next message: Louis Dionne: "[boost] [Build] Test file name conflicts"
Previous message: Robert Ramey: "[boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)"
In reply to: Robert Ramey: "[boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)"
Next in thread: Robert Ramey: "Re: [boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)"
Reply: Robert Ramey: "Re: [boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)"

Le 09/10/15 18:37, Robert Ramey a écrit :
> I believe this whole thread started from the changes in Boost.Test such
> that it can no longer support testing of C++03 compatible libraries.
> This is totally unrelated to the testing of Boost libraries.

The thread started because boost.test broke something used by other
libraries, in a development branch, which raised some misunderstanding
on the purpose of this branch and the overall workflow.

As a side note, I reverted the changes so that C++03 is not required for
the set of features that are not explicitly stating this requirement in
the documentation of 1.59 (datasets mainly, but also some forms of test
declaration and test assertions).

> Here is what I would like to see:
>
> a) local testing by library developers.
>
> Of course library developers need this in order to develop and maintain
> libraries.
>
> Currently we have this and has worked quite well for many years. Making
> Boost.Test require C++11+ throws a monkey wrench into things for the
> libraries which use it. But that's only temporary. Libraries whose
> developers feel they need to maintain compatibility with C++98 can move
> to lightweight test with relatively little effort.

I do not think that local testing has ever been an issue. The value of
the dashboard is on the scalability of the testing wrt.
platforms/compiler combinations, especially for configurations that are
hard to find today (eg. MSVC7) and/or hard to set up (eg. Android).

I would also like to emphasis the difference between the unit testing
tool (boost.test or lightweight) and the test driver (bjam):

- The "API" for running the test bed is bjam. This is used by developers
and the regression testing workflow
- The API for writing tests can whatever developer like, boost.test is
just one choice, which is not directly seen by the regression dashboard.

> Developers who are concerned that the develop branch is a "soup" can
> easily isolate themselves from this by testing against the master branch
> of all the other libraries. The Boost modularization system with git has
> made this very simple and practicle (thank you Beman!).
>
> So - not a problem.

Right: this is trivial locally, yet this is not the current workflow of
the regression dashboard. The complains started because of failures in
develop, and because of workflow considerations + safe increments. As a
developer, I would like to test my library on many runners (and as fast
as possible).

>
> b) Testing on other platforms.
>
> We have a system which has worked pretty well for many years. Still it
> has some features that I'm not crazy about.
>
> i) it doesn't scale well - as boost gets bigger the testing load gets
> bigger.

I suggested a test procedure on "stages of quality" in my previous post:
- fast feedback by continuous runners, giving a quick status on some
mainstream compilers. Runners may have overlapping configuration/setup,
so that the load is balanced somehow.
- scheduling of less available runners on candidates selected from
previous stage. The interface can be by increasing a git branch, the
runners picking that branch only.

> ii) it tests the develop branch of each library against the develop
> branch of all the other libraries - hence we have a testing "soup" where
> a test might show failure but this failure might not be related to the
> library under test but some other library. It diminishes the utility of
> the test results in tracking down problems.

Exactly, but also not being able to track down the history of the
versions on the current dashboard is far from helping. As a developer, I
would like to see a summary of eg. the number of failing tests vs.
number of test, and *per revision*.

> iii) it relies on volunteer testers to select compilers/platforms to
> test under. So it's not exhaustive and the selection might not reflect
> that which people are actually using.

I would say that it would be good if each runner publishes the setup
(not the runtime, but how it has been deployed), and maybe a script for
being able to reproduce this runner. I think about docker (and how easy
it is to describe fully a system), there are tools for the other
platforms, more complicated though.

The idea behind that is to be able to reproduce the runners, so that
they are not shown by name (eg. teeks99-08) but by property (eg.
win2012R2-64on64, msvc-12). I am not saying that the current setup
should not be followed, I am suggesting a way to address the scalability
issue. For that we can have equivalent runners and balance the load.

> I would like to see us encourage our users to test the libaries that
> they use. This system would work in the following way.

If by users you mean the post-release /end users/, are you expecting a
post-release feedback? I am not sure I understand.

BTW, do we have numbers on the number of ppl downloading an release
candidate?

>
> a) A user downloads/builds boost.
>
> b) he decides he's going to use library X, and Y
>
> c) he runs a tool which tells him which libraries he has to test. This
> would be the result of a dependency analysis. We have tools which do
> similar dependency analysis but they would have to be slightly enhanced
> to distinguish between testing, deployment, etc. I don't think this
> would be a huge undertaking given the work that has already been done.
>
> d) he runs the local testing setup on those libraries and their dependents.
>
> e) he uploads the test results to a dashboard similar if not identical
> to the current one.

So we expect having html pages of 10000 columns. I think again the
information needs to be digested.

>
> f) we would discourage uses from just using the boost libraries without
> runnig they're own tests. We would do this by exhortation and by
> refusing to support users who have been unwilling to run and post local
> tests.

Mmmm... sounds bad to me.

>
> This would give us the following:
>
> a) a scalable testing setup which could handle a Boost containing any
> number of libraries.

And what about just a randomized test? Say we have an ever growing
number of tests N (big), but the acceptance or running N is decreasing
with N. Say we limit to M << N (say 100), and we shuffle uniformly: the
feedback would be much faster, the acceptance much higher. On our side,
we need some machinery to digest this information based on the
environment setup.

> b) All combinations of libraries/platforms/compilers actually being used
> would be those being tested and vice versa. We would have complete and
> efficient test coverage.
>
> c) We would have statistics on libraries being used. Something we are
> sorely lacking now.

I am wondering why this would be relevant.

>
> d) We would be encouraging better software development practices.
> Sometime ago someone posted that he had a problem but couldn't run the
> tests because "management" wouldn't allocate the time - and this was a
> critical human life safety app. He escaped before I could weedle out of
> him which company he worked.
>
> And best of all - We're almost there !!!! we'd only need to:
>
> a) enhance slightly the dependency tools we've crafted but aren't
> actually using.

The dependencies are indirectly tested I would say, so testing the
dependencies is a /nice to have/, but if I am using X that depends on Y,
testing X should in most cases be enough. If it happens that the some
breakage goes unnoticed through the tests of X, having tested Y might
have helped but this is not trivial: coverage of X should be improved.

> b) develop a tool to post the local results to a common dashboard
> c) enhance the current dashboard to accept these results.

Several tools exist already, eg. CDash together with cmake. Why spending
that much effort in developing our tools? Our expectations are not that
different than many other open or closed source softwares: we want quick
and/or wide feedback on the development state of boost.

Raffi

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk