Boost logo

Boost :

Subject: Re: [boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)
From: Robert Ramey (ramey_at_[hidden])
Date: 2015-10-09 15:47:09


On 10/9/15 10:54 AM, Raffi Enficiaud wrote:

It's hard to tell, but it seems to me that so far we're in agreement.

>> b) Testing on other platforms.
>>
>> We have a system which has worked pretty well for many years. Still it
>> has some features that I'm not crazy about.
>>
>> i) it doesn't scale well - as boost gets bigger the testing load gets
>> bigger.
>
> I suggested a test procedure on "stages of quality" in my previous post:
> - fast feedback by continuous runners, giving a quick status on some
> mainstream compilers. Runners may have overlapping configuration/setup,
> so that the load is balanced somehow.
> - scheduling of less available runners on candidates selected from
> previous stage. The interface can be by increasing a git branch, the
> runners picking that branch only.

This a pretty elaborate setup. And also fairly ambiguous to me. Seems
like implementing such a thing would be quite an effort - by whom I
don't know.

>
>> ii) it tests the develop branch of each library against the develop
>> branch of all the other libraries
...
>
> Exactly,

OK - so we're agreement about this.

> but also not being able to track down the history of the
> versions on the current dashboard is far from helping. As a developer, I
> would like to see a summary of eg. the number of failing tests vs.
> number of test, and *per revision*.

I don't think such information would be useful to me. But maybe that's
just me.

>
>> iii) it relies on volunteer testers to select compilers/platforms to
>> test under. So it's not exhaustive and the selection might not reflect
>> that which people are actually using.
>
> I would say that it would be good if each runner publishes the setup
> (not the runtime, but how it has been deployed), and maybe a script for
> being able to reproduce this runner. I think about docker (and how easy
> it is to describe fully a system), there are tools for the other
> platforms, more complicated though.

> The idea behind that is to be able to reproduce the runners, so that
> they are not shown by name (eg. teeks99-08) but by property (eg.
> win2012R2-64on64, msvc-12). I am not saying that the current setup
> should not be followed, I am suggesting a way to address the scalability
> issue. For that we can have equivalent runners and balance the load.

Sounds very ambitious and complex.

>> I would like to see us encourage our users to test the libaries that
>> they use. This system would work in the following way.
>
> If by users you mean the post-release /end users/, are you expecting a
> post-release feedback? I am not sure I understand.

This suggestion doesn't address pre-release issues. Frankly, except for
a few issues (develop vs master) cited above I don't think they are a
big problem and I think the current testing setup is adequate.

But this system can really only test the combinations that the testers
select. The problem comes up after release when one gets bug reports
form users of the released library. I would like to get these sooner
rather than later and on the platforms that people are actually using.
I often get issues reported which are related the current configuration
but but the user hasn't run the latest tests on his current setup so all
I get is a complaint. If the user ran the tests on the libraries which
he's using (which he should be doing in any case!) I'd have a lot more
to work with and bugs would get discovered and addressed sooner with
less effort.

Of course if users want to switch to develop branch on those libraries
they use and run the tests pre-release - that would be great. But I'm
not really expecting many people to do that.

> BTW, do we have numbers on the number of ppl downloading an release
> candidate?

I'm guessing we do.
>
>>
>> a) A user downloads/builds boost.
...
> So we expect having html pages of 10000 columns. I think again the
> information needs to be digested.

LOL - that would be great !!! Of course if such a proposal were to be
so wildly successful so as to create such a problem, we'd have to
upgrade our archiving and inquiry of test results. I'm not losing any
sleep regarding this issue right now.

>> f) we would discourage uses from just using the boost libraries without
>> runnig they're own tests. We would do this by exhortation and by
>> refusing to support users who have been unwilling to run and post local
>> tests.
>
> Mmmm... sounds bad to me.

LOL - we can't agree on everything.

>
>>
>> This would give us the following:
>>
>> a) a scalable testing setup which could handle a Boost containing any
>> number of libraries.
>
> And what about just a randomized test?

I don't see how that would be better.

>> c) We would have statistics on libraries being used. Something we are
>> sorely lacking now.
>
> I am wondering why this would be relevant.

OK - it's not really relevant as far as testing is concerned. This
information would become available as a side effect.

But it would be extremely useful to know that library X has N users.
This would help indicate which libraries might be considered for
elimination from the standard boost distribution. If something like
"boost/shared_ptr" is used by only 10 people - it would be interesting
to know. If the serialization library is only used by 10 people, I
would be very interesting to know. Etc.

>> And best of all - We're almost there !!!! we'd only need to:
>>
>> a) enhance slightly the dependency tools we've crafted but aren't
>> actually using.
>
> The dependencies are indirectly tested I would say, so testing the
> dependencies is a /nice to have/, but if I am using X that depends on Y,
> testing X should in most cases be enough.

Let's suppose I'm going to use some boost library X and Y (through
dependency) as part of the aircraft control system of the next 400
person passenger plane. Wouldn't you feel safer if all the code used in
the system were tested? Would you say it's good enough only test some of
it? And if you can run the tests almost for free, is there any reason
you would skip it?

Basically if I'm going to deploy X in my product and it depends on Y and
Z, all those should be tested in my environment. And there's absolutely
no reason not to do this.

OK - I didn't explain this well.

>> b) develop a tool to post the local results to a common dashboard
>> c) enhance the current dashboard to accept these results.
>
> Several tools exist already, eg. CDash together with cmake. Why spending
> that much effort in developing our tools? Our expectations are not that
> different than many other open or closed source softwares: we want quick
> and/or wide feedback on the development state of boost.

I totally agree.

But it's not that simple when you got down to details. I have personal
experience with CDash. I've used as part of the Safe Numerics library
to be found at www.blincubator.com . I've recommend it's usage and
describe how to use it at that same web site. So I'm more familiar
with it than most. It's pretty tightly coupled to CMake and CTest and I
don't see an obvious way to use it with our bjam test setup. How about
replacing bjam with CMake - interesting but not simple either as they
don't really match in capability. And the test reporting isn't quite up
to our needs.

Having a bit experience in all this in the context of Boost, I still
believe they path I've proposed is the best one.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk