Boost logo

Boost :

Subject: Re: [boost] boost.test regression or behavior change (was Re: Boost.lockfree)
From: Raffi Enficiaud (raffi.enficiaud_at_[hidden])
Date: 2015-10-09 05:34:52


Le 08/10/15 19:46, Bjørn Roald a écrit :
>> On 04 Oct 2015, at 14:49, Raffi Enficiaud <raffi.enficiaud_at_[hidden]> wrote:
>>
>> Le 04/10/15 13:38, John Maddock a écrit :
>>>
>>>
>>> On 04/10/2015 12:09, Bjorn Reese wrote:
>>>
>>> As many others have said, Boost.Test is "special" in that the majority
>>> of Boost's tests depend on it. Even breakages in develop are extremely
>>> painful in that they effectively halt progress for any Boost library
>>> which uses Test for testing.
>
> This sort of problem has been discussed before on this list without
> any real progress. I think a solution to this is needed to allow boost
> tools maintainers (boost.test is also a tool), similar services that
> library maintainers enjoy. A solution may also provide better test
> services for all boost developers and possibly other projects. An idea
> of a possible way forward providing a test_request service at
> boost.org/test_request is outlined below.

I think the problem are simple:
- the "develop" branch is currently a soup.
- the regression dashboard should be improved.

I will detail those two bullets.

> I would like thoughts on how useful or feasible such a service would
> be, these are some questions I would like to have answered;
>
> - Will library maintainers use a boost.org/test_request service? -
> How valuable would it be, as compared to merging to develop and
> waiting for current test reports? - How much of a challenge would it
> be to get test runners (new and
> old) onboard?

As far as I can see, some libraries have testing alternatives. Some are
building on Travis. Yesterday, I created a build plan on my local
Atlassian Bamboo instance, running the tests on all branches of
boost.test against develop, on several platforms. Obviously, "several"
platforms/compilers (5) is not in the same scale as the current
regression dashboard, but it is a good start.
What I need now is a way to publish this information on a public place,
because my Bamboo CI is on an internal network.

> - How feasible is it to set up a service as outlined below based on
> modification
> of the current system for regression testing in boost?

I think if we want to reuse or build upon the current system, it is hard
and limiting.

>
> - What
> alternatives exist providing same kind of, or better value to
> the community,
> hopefully with less effort? E.g.: can Jenkins or other such test
> dashboards /
> frameworks easily be configured to provide the flexibility and
> features needed here?

I think that what you propose is well covered by already existing tools
in the industry.

For instance, having a look to Atlassian Bamboo might be a good start:
- it's **free for open source projects**
- it's compiling/testing **one** specific version across many runners,
so we have a clear status on one version. The dashboard is currently
showing many different versions.
- builds can be manually triggered or triggered on events: eg. change on
core libraries, change on one specific library, scheduled (nightly)
- it's trivial to set up, we can also have many different targets
(continuous, stable, release candidate, etc). It has an extensive way of
expressing a build in small jobs (can be just a script).
- it understands git and submodules: one version is checked out on the
central server, and dispatches on all runners. Runners can fully cache
the git repository locally to lower the traffic and update time.
- it provides metrics on the tests/compilations: this would then be used
for release managers to make appropriate decisions on what would be the
next stable version to build/test against.
- it understands branches, and can automatically fork the build on new
branches: it is then easy to test topic branches on several runners.
- it maintains an history of the build/test sessions (configurable) that
allow us to go back in time readily to check what happened.
- it has a very nice interface
- it can dispatch build/test based on requirements on the runners:
instead of making a run on all available runners, you express the build
as having requirements such as Windows+VS2008, Clang6+OSX10.9, etc. The
load is also dispatched on runners.
- it's Java based, available as soon as there is a Java VM for a platform.
- etc etc.

The only thing I do not think it addresses today is the asynchronism of
the current runner setup: in the current setup, the runners may or not
be available and provide complementary information (some of them are
running once a month or so), but without being strongly synchronized on
the versions of the superprojects. In the Bamboo setup, the version is
the same on all runners, so if runners are not available, it is blocking
the completion of the build. It's easy to address this issue by having
lots of runners providing overlapping requirements though.

The way I see it is:
1-/ some "continuous" frequent compilation and test is running, using a
synchronized version on several runners.
2-/ based on the results (eg. increased stability, bad commit disaster,
unplanned breaking change), a branch on the superproject eg.
develop-stable is moved forward and pointing to a new, tested/confirmed
revision of the previous stage
3-/ the current runners test against the "develop-stable", and provide
information on the existing dashboard
4-/ metrics are deployed on the dashboard to see what is happening with
boost during the development (number of compilation or test failure, etc).
5-/ a general policy/convention is used for master and develop: master
is a public candidate, stable and tested. Develop is isolating every
module/component and building against master or develop-stable (or
both). For instance, boost.test[develop] builds against master (last
known public version), except for boost.test which is on develop (next
version).

The advantages would be the following:
- develop-stable moves by increment in a stable manner, less frequently
and more surely than the current develop
- develop-stable is already tested on several mainstream configuration,
so it is an already viable test candidate for the runners. It avoids
wasting resources (mostly checkout/compilation/test time, but also
human: interpreting the results, this time with less results to parse )
- with "develop-stable", we have real increment of functionality: every
step in develop-stable is an improvement on the overall boost, according
to metrics universally accepted (yet to be defined).
- having this scheme with the bullet 5-/ on
master/develop/develop-stable allows to test the changes wrt. what was
provided to the end-user (building against master) and wrt. the future
release of boost (building against develop-stable). It also decouples
the different potentially unstable states of the different components.
- if we have a candidate on develop-stable or master that is missing
some important runners, we can synchronize (humanly) with the runner
maintainers to make them available for a specific version. Again less
resource waste, better responsiveness.

The shortcomings are:
- having a develop-stable does not prevent the runners from running on
different versions.
- someone/something has the power/decision of moving develop-stable to a
new version.
- triggers more builds (has to be tampered though, a build on eg.
boost.test would happen only if boost.test[develop] changes).

What is lacking now:
- a clear stable development branch at the superproject level. The
superproject is an **integration** project of many components, and
should be used to test the integration of versions of its components (if
they are playing well together). As I said, the current develop branch
is a soup, where all the coupling we want to avoid are happening.
- a way to have a quick feedback on each of the components, against a
stable state. Quick also means less runners, available 95% of the time.
- a dashboard summarizing much better the information, keeping an
history based on versions, and providing good metrics for evaluating the
quality of the integration

As a side note, I created a build plan with Bamboo for boost.test,
testing all the branches of boost.test against boost[develop]. This is
quite easy to do. An example of log is here:
http://pastebin.com/raw.php?i=4aGPnD1a

Build+test of boost.test took 12min on a windows runner, including
checkout, b2 construction and b2 header.

Raffi


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk