Subject: Re: [boost] RE process (prospective from a retired FreeBSD committer)...
From: Sean Chittenden (sean_at_[hidden])
Date: 2011-01-28 16:30:53
> This is all pretty standard procedure for many projects, IIUC.
> However, my vision for Boost is a bit different:
> * each library has its own Git repo
> * each library maintainer tests and produces versioned releases of
> that library on his/her own timetable
> * the latest release of a library is tagged "STABLE"
> * When assembling a new Boost release, the release manager marks the
> "STABLE" revision from each library and begins boost-wide
> integration/interoperability testing on those revisions
> * When a library release fails integration testing, the release
> manager has several options:
> * Give the library maintainer a time window in which to produce a
> new STABLE release that passes integration testing
> * Use an earlier STABLE release of the library
> * Withold the library from the next Boost release (drastic)
> * If the failures appear to be due to breakage in another library,
> the release manager can apply these procedures to the culprit library
> One idea behind this process is that it allows individual libraries to
> "officially release" fixes and updates without forcing Boost's release
> managers to coordinate a Boost-wide point release.
This sounds reasonable to me, though what happens to software that works and is now in maintenance mode with an author that has left the boost community?
As for modularizing things out of the base, this is what FreeBSD has done with many pieces of software via the ports tree. In fact, many other traditional bits of software that many feel are default parts of an OS have been moved to the ports tree (e.g. perl is not in the base OS install on FreeBSD, and from what I hear gcc will be moved out as well in favor of llvm).
> No offense intended to anyone, but SVK is at best a poor-man's Git :-)
None taken as I've never used it myself. :~]
>> A schedule and handoff of hats/responsibilities for each of the
>> branches is pretty well documented and is now time driven (it used
>> to be feature driven, which didn't work as well as they'd hoped):
> Not sure how that page illustrates what you're saying above. Care to
There are people who maintain roles and responsibilities for various branches as each branch goes through its life cycle. Each branch roughly moves through the following life cycle:
HEAD = Any developer w/ commit access can commit
pre-release branch (e.g. BOOST_1_46_prerelease) = Any re@ can approve commits to the branch but developers are required to get re@ permission (re@ = release engineer hat)
released branch (e.g. BOOST_1_46_0) = re@ hands the branch off to security-officer@ for security related fixes. Sometimes re@ will push out a micro version release for critical bugs, but security-officer@ takes ownership of the 1_46_0 branch until the branch is EoL'ed (I don't think boost has a concept of an EoL'ed branch at present).
So in that scenario, there are three hats that could be worn (some/often the same people, but looking at a problem from different perspectives).
The above link is just the time table and indicator for developers for who has the ability to bless a commit to a particular branch. Once a branch is handed off to a particular hat (e.g. re@ or sec@), then you have to go find the right person.
One update that I would like to see for boost would be for the hats for each branch to be solidified. Maintainers of the 1.46 branch might be different than 1.45 vs 1.44 and there's a certain amount of institutional knowledge that comes with shepherding a release that results in slightly better judgement calls (i.e. "oh! that release had a funky interoperability bit with XYZ so we did ABC to fix this"). Actually, a real world example of this would probably be what happened with boost::serialize a release or two back. When 1.50 comes around and someone else is doing the re@ process, but there's a fix for 1.40, the 1.40 re@ team will almost certainly remember the oddity, but the 1.50 re@ team may not.
>> A 'Reviewed by' commit header.
> Git has a mechanism for signing off on changes with cryptographic
> security, so that you can have reasonable assurance that the change
> was actually approved by the person claimed.
The crypto assurances that the handoff was valid is important in a dvcs, though my point was in pushing MFC-like approvals in to the commit messages so that people can identify when a change was approved or why, or spark a discussion on a mailing list.
>> This gives greater latitude in terms of what can be included in the
> Could you be more specific?
An MFC's commit-hook requires that filenames match, that's it. When you use the 'Reviewed By' commit message header, you can include different files that weren' in the original commit to -CURRENT. I don't know that this level of control needs to be in place, but there are specific process definitions for each of the commit headers.
>> And the 'Approved by' commit header. Once a release is frozen, all
>> commits need to have this tag otherwise the commit will fail.
I think this statement stand on its own. To be clear, the commit hook actually examines the contents of the commit message and looks for various elements and will fail a commit if the commit message is not in accordance with the branch's ACL's.
>> If someone commits with that line and didn't have permission to do so,
>> as a policy, the commit is always reverted 100% of the time as a
>> matter of principle. Frequently it's re-committed, but it's a big
>> slap to the back of the hands to have to go through the process
>> again. Needless to say, the commit mailing list is the most active
>> and widely read list as a result.
> Sorry, I don't understand. How does that procedure result in many
> people reading the commit list?
People read the commit mailing lists because that's the easiest way to quickly digest the changes that are made to the tree. If something catches your eye, you can quickly go from the mailing list to a diff to review the change. Because enough people do this and the attention level has reached critical mass, the commit list is the focal point for most of the technical/mechanical discussions.
>> And tons of code gets reviewed with many eyes viewing it as a
>> result. Unlike boost, FreeBSD uses an abridged commit message that
>> doesn't include the actual diff itself
> Do you mean that the message sent to the commit mailing list is
>> (if you're a committer, then you see the diffs to areas of the tree
>> that you subscribe to).
> Nifty; that keeps down the noise.
Indeed. At several hundred commits per day, managing the signal to noise ratio is an important part of the equation.
>> I can't stress the organizational difference and peer pressure a
>> widely used commit mailing list brings about.
If many eyes are watching a commit list and a discussion starts, relevant and interested parties are quick to respond to the technical merits/problems with a particular commit. Because of this self-policing and pressure to perform in front of your peers, the quality of commit messages is very high (as well as commits since reverts show up on the commit mailing list as well).
My gripe with git and all dcvs's is this frequent lack of a central review process and core community of reviewers. I forgot to mention, but in the case of the BSDs, if you have commit access, you are involuntarily subscribed to the commit mailing list.
>> Post-1.46 release fixes could go in to 1.46.0 and (heaven forbid),
>> maybe even a micro version with specific fixes for the .46 minor
> Sorry, I don't understand the above at all. How could post-1.46 stuff
> go into 1.46.0? 1.46 == 1.46.0! And I don't even grok the "micro
> version..." part enough know what questions to ask you about it. Care
> to try again?
No worries, I was in a bit of a hurry after I double tapped send.
Here's the significance of the tag/branch naming scheme in a boost hypothetical situation.
BOOST_1_CURRENT = trunk for Boost 1.X development (plans/criteria for a 2.0?)
BOOST_1_STABLE = the branch that will have the next release (e.g. 1.47.0). There isn't much in the way of API stability that needs to be maintained atm, but if there were, a BOOST_2_CURRENT with incompatible APIs from BOOST_1_CURRENT would probably need a head vs production branch.
BOOST_1_46 = re@ has branched BOOST_1_CURRENT to create BOOST_1_46 to get the tree in shape for a release. The only commits allowed to go in at this time are commits that pertain to stabilizing the release and getting tests to pass.
BOOST_1_46_0 = re@ has shipped the 1.46.0 release. This branch is now managed by security-officer@ and to a lesser degree the re@ team.
BOOST_1_46_1 = A branch/tag that security-officer@ creates once there is a reason to release an updated release of 1.46.0. 1.46.0 to 1.46.1 is required to be API compatible. In the case of the BSDs, all minor and micro versions are required to be ABI compatible (which is why gcc in the base system only gets updated in the BSDs along major release numbers).
Is that a better explanation?
>> PostgreSQL ships with a contrib/ directory of "soon to be" or
>> "possible candidates for being a core component" which serves the
>> same purpose as FreeBSD's ports structure. This gets yet-to-be
>> finalized modules out in the wild and helps garner interest.
> having a contrib/ directory is an interesting idea for Boost. That'd
> be very different from the sandbox. But who's responsible for
> maintaining that code?
Whoever's the review manager and the review manager's mentee who originally wrote the module.
>> PostgreSQL's autovacuum went from being a fringe project to a
>> contrib/ module and a core feature in 2 minor releases because of
>> the huge interest in its use/adoption. Boost.Log or Atomic or any
>> of the other "we all really want this but it's not quite finalized"
>> modules seem like ideal candidates for inclusion in such a directory
>> because it could generate additional interest/eyes. A contrib/ or
>> proposed/ would go a long way towards keeping boost lean-ish, too.
> How so? It sounds like it means adding more to each release.
My train of thought was starting to drift here and that last sentence was largely a reaction to not being able to checkout the entire boost svn repository because of the sandbox. Let me clean that thought up real quick and rephrase.
PostgreSQL's autovacuum went from being a fringe project to a
contrib/ module and a core feature in 2 minor releases because of
the huge interest in its use/adoption. Boost.Log or Atomic or any
of the other "we all really want this but it's not quite finalized"
modules seem like ideal candidates for inclusion in such a directory
because it could generate additional interest/eyes.
Like autovacuum, Boost.Atomic has the potential to be a fantastic piece of boost's infrastructure where many boost modules could make use of it.
A contrib/ or proposed/ would go a long way towards keeping boost lean-ish, too. Here's how:
Being able to modularize boost modules in to little packages that can be enabled/disabled independently of the main source tree/core will reduce clutter and make interdependencies explicit. Right now it takes ~10min and 1GB of ram on my 2.8Ghz Xeon to get bjam to complete its dependency checks before the first line of anything is compiled. Having smaller modules that can be enabled/disabled quickly because things are well contained in isolation should reduce the overhead for source installs. As time passes and the number of headers or modules grow inside of the main boost source tree, how long will that process take to complete in the future? Reducing the number of things that bjam has to keep track of in the source tree seems like the only way to mitigate this (for me personally, this has been a growing source of frustration and was a sore spot for me with boost since day #1 and why I initially was using cmake instead of bjam).
>> At present it seems like boost-*.tar.bz2 is on track to including
>> boost/kitchen/sink.hpp and boost/bath/water.hpp and that's something
>> that is a bit concerning to me on the long-term scale.
> That seems like a semi-random fear that I don't see being addressed by
> a contrib/ directory.
See above re: bjam's install time. Patching in various boost libraries to be compiled and installed via bjam is laborious, especially when things are still under development and the compile bombs out.
>> The structure on the server is largely there, but it the svn tree
>> looks pretty disorganized with lots of legacy clutter so it doesn't
>> look like it's being used well.
>> And lastly re: VCSs, lots of people fork FreeBSD to do experimental
>> work out of the tree via git and hg, but the monolithic and
>> serialized commit/review process seems to be working quite well from
>> my perspective. A little bureaucratic but very stable and
>> democratic without reliance on any one person to push the release
>> forward. Anyway, food for thought. Hopefully there's something
>> there that you can pick out of value.
> One of the differences between FreeBSD and Boost is (I think) that the
> vast majority of the actual code in FreeBSD is in the kernel, and thus
> fairly highly coupled. IMO in Boost, maintainers quite properly have
> much less interest in what is happening in other libraries. That may
> make a difference in choosing a suitable procedure.
I see boost as much closer to the FreeBSD port system, actually. I think there is a boost "core" but things that are written using boost aren't explicitly core until something depends on them.
Anyway, hopefully this has been useful (and a break from the string discussion, yikes!). -sc
-- Sean Chittenden sean_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk