|
Boost : |
Subject: Re: [boost] boost.test regression or behavior change (was Re: Boost.lockfree)
From: Bjørn Roald (bjorn_at_[hidden])
Date: 2015-10-08 13:46:24
> On 04 Oct 2015, at 14:49, Raffi Enficiaud <raffi.enficiaud_at_[hidden]> wrote:
>
> Le 04/10/15 13:38, John Maddock a écrit :
>>
>>
>> On 04/10/2015 12:09, Bjorn Reese wrote:
>>
>> As many others have said, Boost.Test is "special" in that the majority
>> of Boost's tests depend on it. Even breakages in develop are extremely
>> painful in that they effectively halt progress for any Boost library
>> which uses Test for testing.
This sort of problem has been discussed before on this list without any real progress. I think a solution to this is needed to allow boost tools maintainers (boost.test is also a tool), similar services that library maintainers enjoy. A solution may also provide better test services for all boost developers and possibly other projects. An idea of a possible way forward providing a test_request service at boost.org/test_request is outlined below.
I would like thoughts on how useful or feasible such a service would be, these are some questions I would like to have answered;
- Will library maintainers use a boost.org/test_request service?
- How valuable would it be, as compared to merging to develop and waiting
for current test reports?
- How much of a challenge would it be to get test runners (new and old) onboard?
- How feasible is it to set up a service as outlined below based on modification
of the current system for regression testing in boost?
- What alternatives exist providing same kind of, or better value to the community,
hopefully with less effort? E.g.: can Jenkins or other such test dashboards /
frameworks easily be configured to provide the flexibility and features needed here?
First a bit of motivation. When source changes are made in source code that is intended to work on multiple tool chains and target platforms, the testing challenge is vastly more complicated that just testing the compiler and operating system (host platform) you use for development. Conceptually it does not need to be that much harder if a single host platform action, even before local commit of the changes, caused compilation and testing to be staged and executed on any number of remote build hosts and target platforms, and timely results where made available in a suitable form on the host initiating it all. The suggested test_request service outlined below is an attempt to achieve this.
A test request service is a mechanism that would allow library maintainers to post a test request indicating version of sources to build and test. The intention would be allowing the library maintainers a way of testing their changes on specified targets against a specified known to be stable baseline of other libraries that is defined as part of the request. A method of selecting test runners or indicating properties of the test runners you request the test to be performed for is needed. Also it should be possible to specify which libraries to only compile and which to test. The output test results need to be managed in the context of the test request, not the overall Boost develop or master test results.
Test runners should probably be able to control the extent they are willing to pick up and process test requests, as opposed to only running the regular boost regression tests. Some sort of scheduling may be desirable or needed to automate well while preserving the precedence of the main boost regression tests and not exhausting test runner resources. This may be achieved by deliberately starving test requests that are resource hungry and often requested to allow leaner, quicker, or less often requested test requests to be processed first. Such smart scheduling is probably not trivial, so the best thing would be to ignore it if it is not needed, but I have a feeling it may be needed to throttle load on test runner hardware and to ensure the more critical tests for the overall community success is serviced.
Beyond the needs og Boost.Test in this topic, I think test request will allow all maintainers to test on all relevant targets, given tester runners are available, and to perform these tests before merging to develop. Thus allowing more issues to be resolved before merge, a more stable develop branch, and less disruptions in the main develop test reports. For many libraries, test requests will require very small amounts of test runner hardware resources as compared to full boost regression tests, which more or less blindly run all tests. This is opening the prospects for quick responses to test requests and thus a more interactive work-flow. Such quick test could possibly possibly run specific test cases in a specific library on specified testers. It seems possible that such test requests should be serviced in seconds, piping failures back into the requesters development environment or even into an IDE issues list. But that is details that can be delt with later, a simple response back to the submitter with URL to use for fetching progress information and results, is a good start. The OGC WPS protocol uses this approach, and that sort of protocol may be a good fit for test requests. If the test request get a web URL, a web version of the results could be available there for a given number of days before it is purged or archived. As there will no longer be only a couple of specific places to find boost test results, RSS, AtomPub or similar protocols may be useful to allow users to subscribe to test results for a given library or even for a specific test request.
One likely desirable feature, that is a challenge, would be to allow testing of changes before they go into a commit that is pushed to a public git repository. That could be achieved by specifiying a public commit and use git to create a patch in client that is part of the test request. That way the the test runners servicing the request can use git to apply the patch onto the specified commit before performing the tests.
If there are no way of doing this with existing available tools and new tools are needed, the following is what I could envision as one proposal for a solution.
1.
A client command line tool to make the test request is needed. Tighter API based integration into IDEs and other GUI environments may be possible, but not essential as the command line too can be used. A successful local build of boost is a logical prerequisite for posting a test request to a service, hence this mean that the the client tool itself can depend on boost build and possibly other parts of boost such as asio for networking. Also it can be assumed that you boost sources checked out locally with git available to check status, log, and extract patches onto last public commit on gthub. The tool may allow user to invoke it in same fashion as b2 to specify what to test, or it may require using a user defined profile configuration for the test request specification, possibly also a combination of the two invocation methods can be supported as well. A user may define more than one profile in a local configuration file, one is specified as default or the first listed become the default. Based on the specified or default test request profile, the tool create and post a test request with respective git commit IDs and patches from currently local boost working directories whenever source code is changed. The client tool should allow special parameters canceling further processing on the last posted or a specific request, or similarly superseding it with a new request. Think of it as stopping the compiler locally changing some code and compile again, in that case we do not want the old test requests to be in effect at the service.
2.
A service at a well known address, e.g.: www.boost.org/test_request receives the request and give it priority according to current administrator policies and possibly some scheduling scheme. Policies may if needed be changed in different phases of boost release cycles. The test request is rejected, or a test ID is selected, the specification and status is made available to testers and other clients. The client is provided a response accordingly with URL to status data, or reason for rejection. Possibly a second URL with ownership privileges to the request, e.g. the ability to cancel the test request, renew it or supersede it with a new. The service maintain a table of outstanding test requests that is fetched on demand by testers.
3.
Modify the existing test runner scripts such that, when a teste runner start, or when it has more time available for boost testing, the table of currently outstanding test requests is fetched from boost.org and a suitable job is picked based on som simple rule using data in the table, tester properties, and remaining available time for test requests. The test request details is fetched from boost.org and a message is posted to the service signalling start of processing of the request at the test runner. At regular interval the tester script should post progress as to the service and check if the request is cancelled or superseded, in which case further processing can be stopped. Finally when processing is completed, the tester script need to provide the results to the service.
4.
The boost.org/test_request service will maintain a table of active requests, after a time duration specified in the request, the request is deactivated and removed from the table by the service to avoid that test runners continue to pick the test request. A sensible default and maximum duration is defined by the service. The table may be made viewable at a well known location. E.g.: boost.org/test_request/active. as html, and in simpler machine readable forms as used by test runner script.
5.
The boost.org/test_request service may have a table of recent requests as well, keeping URLs available for test requests that still have request and result data available on the service. After a configurable number of days a test request´s data should be purged to clean up resource usage at the service host. Before that, it should be possible for clients to download and archive the request data, both request and results. The owner of a test request may be allowed to renew an active request to prevent it from being deactivated, or even re-acticate de-acticated requests. This way it should be possible to wait out a low priority on your request without always failing with no results, scheduling should allow any accepted request eventually get priority regardless of active policy, or it would be better to reject with a reason stating that a higher priority is required or a smaller scope must be selected for the test.
â
Bjørn
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk