Boost logo

Boost :

Subject: Re: [boost] Respecting a projects toolchain decisions
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2010-12-28 07:00:06


On Tue, Dec 28, 2010 at 5:33 PM, John Maddock <boost.regex_at_[hidden]> wrote:
>>
>> 1. The signal/noise ratio can be hard to keep down especially if you
>> have a lot of ground to cover. Consider how you (or any other
>> maintainer for example) would want to manage a part of the 1000
>> tickets that are all in the same pile. Sure you can query it many
>> different ways, but wouldn't it be way easier to just look at 100
>> issues just for Boost.Regex than it is to spend some time looking at
>> 1000 issues that might be relevant to Boost.Regex?
>
> I only ever look at those issues that are relevent to me, curently not quite
> down to single figures, but close ;-)  and that covers all of config, regex,
> math, type_traits and tr1...
>
> It's also not uncommon for issues to either effect multiple libraries, or to
> need to be reassigned from one library to another, the current system makes
> that trivial - albeit I do wish that Trac had an easier way to get from
> folks real names to their SVN login name (we should probably have insisted
> folks use their real name for this).
>

Not withstanding the issue with real names -- that gets solved
partially by a GPG web of trust system where you associate keys with
people with real names and/or unique email addresses -- re-assigning
tickets is a bad practice IMO. The reason is simple: train the
users/developers to file the issue in the correct issue tracker; if
they file it wrong, it gets closed with an explanation of what to do
correctly.

For someone wanting to contribute, hacking up a Trac query to get
issues that only pertain to Boost.Regex is too much to ask. That kind
of barrier is the kind that I want to be able to remove -- if you want
to see the issues related to Boost.Regex that need fixing, there
should just be one place where you find everything there is to find
about Boost.Regex. No fumbling about with Trac queries just to filter
out the noise from other projects that a potential contributor would
rather not deal with.

>> 2. It's harder to divide and conquer if you start from the top-down.
>> Let me qualify that a little: if you start with one big-ass pile of
>> dung, the stink is much harder to overcome than if you processed the
>> input as it comes in and segregate bottom-up (no pun intended). If you
>> had one place where issues for Boost.Regex gets tracked, where
>> discussion around Boost.Regex gets documented, where design decisions
>> are hashed out, and where documentation is ultimately developed, then
>> your progress with dealing with Boost.Regex shouldn't hamper the
>> progress and development of other libraries not dependent on
>> Boost.Regex. This means issues for Boost.Proto don't get piled into
>> the same pile of issues where Boost.Regex issues will be piled on.
>> Processing the issues as they come in would be way easier to manage
>> than if you started with one pile containing both issues.
>
> I'm not sure I follow, I do process issues as they come in, and there is one
> place for regex discussions - right here with [regex] in the title - or on
> any Trac ticket assigned to me.
>

Right.

Is it just me who thinks that the Trac UI is needlessly complicated
when filing issues? I use GitHub's issue tracker and it gives me two
fields: the title of the issue, and the comment that comes with the
title (which serves as a longer description of the issue). If you want
to send me a patch on GitHub, you ought to fork the repository, make
changes to your own fork, then ask me to pull. I can review the
changes right there and then with a few quick steps merge your changes
into my repository.

No attaching files, tickets, etc.

If you wanted to show some code to show how to reproduce an issue, you
either put the code in-line or link in a Gist. It's really that simple
over there.

As opposed to Trac which has 15 (?) fields to fill out just to file an
issue or start a conversation around a feature request, etc. ;)

>> 3. I'm not sure how the "single point of failure" comes into play, but
>> centralized anything means that one thing goes down, then everything
>> fails. I don't think I need to stress that point any more than I have
>> to. ;)
>
> OK you win on that one ;-)
>

;-)

>>
>> Being able to trust people and empowering people to actually be able
>> to just muck around with things and then asking for changes -- that
>> still need to be reviewed anyway -- to get baked in and shepherded by
>> trusted people (note, not just "that one guy") lowers the barrier to
>> entry for contributors. It's actually a better problem to have if you
>> have 10x more contributors than it is to have 10x more issues. Because
>> sending patches around is brittle and a nightmare to manage, using
>> tools that make it easier should be a welcome development I imagine.
>
> Whose doing the reviewing?  IMO it's back to that "one guy" again - OK there
> may actually be more than one person who can do that (as there may be now) -
> but there's still that bottleneck there IMO.
>

No, the process of reviewing can be done by anybody really. I can
choose to merge in changes you make on your fork in case you have a
nifty implementation that I would like to either build upon or
shepherd into getting someone else to pull it from me. You can work
around whether that "one guy" is actually there or whether he bothers
to look at the pull requests that make it into his queue.

A trusted group can then pick up the slack of "that one guy" not being there.

In the end, from whom the release managers choose to pull changes from
is no longer a matter of who the maintainer of a library is, but
rather who they (or the community) think they should pull from. This
will largely revolve on whom they trust *and* who does what well
enough to have their changes pulled in.

Pure meritocracy +1, lower barrier to entry +1.

>> That said, consider the case where you have 5 trusted people who you
>> know already know the Boost.Regex internals either as much as you do
>> or better, then have 10 people who implement new features and make
>> changes to the implementation -- would you rather be the one to deal
>> with the changes of these 10 people or would you welcome the word of
>> any of the 5 trusted people to apply changes that any of the 10 people
>> make on your behalf? Essentially in the current parlance, these 5
>> trusted people would normally be called "co-maintainers", and the 10
>> people would be called "potential contributors"; in case any of the 10
>> potential contributors have their changes pulled in, then they become
>> "contributors".
>
> IMO there's nothing stopping folks now from getting involved like that, and
> there are a few very welcome folks around here who have their fingers in
> multiple pies and can help out with any of them.  But IMO the central issue
> is getting the volunteers in the first place.
>

What's really stopping folks now is the high barrier to entry for
potential contributors. Just getting sandbox access -- asking
permission -- is hard enough as opposed to clicking a button that says
"fork". That "asking permission" part is what's stopping many people
from even trying to contribute. That additional mental step of having
to ask for permission to make changes is really a non-starter for most
of the potential contributors.

Then there's also the issue of being called a "maintainer". Labels
matter, and putting that label on someone conveys some sort of
authority, which some people really wouldn't like. Instead of being an
encouraging factor, it becomes a discouraging factor.

Also, just like in real life, earning someone else's trust is hard
enough, making it harder doesn't encourage more people to try and earn
others' trust. In the current scheme of things, to gain the other
maintainers and release managers trust, you're going to have to make
it into the club by submitting a full-blown library that gets reviewed
and accepted -- there's no second-tier or level of contributors who
just want to help out by submitting patches and earn trust that way.

Maybe the Guild is a potential way of getting more interested
contributors into the fold, but it's still a top-down approach to
solving the issue IMO.

>>
>> Sure, that's a thought, but that's thinking with a band-aid short-term
>> solution. The bug sprints are a good idea, but I don't get why a bug
>> sprint can't last 1 whole year and be an on-going effort. Having bug
>> sprints and hit squads (ninja clans, strike teams, etc.) are
>> short-term non-sustainable solutions to the issue of open source
>> maintenance.
>
> The reason it doesn't last all year, is simply not getting enough volunteers
> to run things IMO.
>

Actually, the fact that there has to be a sprint to address the issues
in a focused manner is a little disturbing to me.

I like participating in things like the bug sprint, and maybe the
occasional hackathon. But unfortunately the issues being addressed are
symptoms of a larger problem, which is that:

1. There are already a lot of issues raised and the current
maintainers of the libraries either don't have time to address them or
aren't interested in addressing them. Either way, they're MIA and
getting someone else to replace that role is not the solution either
-- because that person can later on be MIA and the
development/maintenance halts again as a result.

2. Because of the high barrier to entry for potential contributors
coupled with the high potential for maintainers to be MIA for various
reasons, the issues that get ignored or remain un-addressed
contributes to the larger hurdle of improving or maintaining Boost
library quality. More issues means more work needs to be done, and
having a high barrier doesn't help with allowing others to do that
work immediately.

3. The bug sprint is a short-term solution to stop the bleeding, it
has to be augmented with a larger more sustainable effort to cutting
down the issues that are being raised or that have already been
raised. Maybe the guild is a source of potential contributors but if
we don't address the high barrier to entry, we're not probably going
to see much uptake on being a member of the guild.

>> I'd for one as a potential contributor would like to be encouraged to
>> dive into the code, get some changes submitted, and see that there are
>> people who actually care. With the current process and system in
>> place, I don't feel like it's a conducive environment for potential
>> contributors only just because of the barriers to entry.
>
> I still don't see how changing toolsets helps this - right now you can SVN
> copy a single library into the sandbox *or any other SVN repo of your
> choice* - it took me about 2 minutes flat to do this for a section of
> Boost.Math that I wanted to work on this month - and then away you go, edit
> away to your hearts content, and submit the final result when you're ready.

Can you svn copy from one SVN repository to another?

How do you experiment on SVN, you make tons of branches that you may
potentially forget later on -- how about sending the changes over
email or how do you sign changes to certify that you were the ones who
really made them and not someone who just managed to forge patches on
your behalf?

> I accept that GIT may be a lot easier *for folks that already use it*, just
> as SVN is easier for those of us old timers that have been using that for a
> while.  Of course I didn't see the need to change from CVS to SVN either, so
> you can see where I'm coming from.... :-0
>

Well, the workflow is what's fundamentally different.

With Git, everybody has a repository of the code. This means you can
muck around with your local repository, make as much changes as you
want, pull in changes from other repositories willy-nilly, stabilize
the implementation locally, then have others pull from your repository
as well. Because with subversion you have to maintain a single
consistent view of the repository at any given time, the cost of
making changes is a lot higher than it is if you had a local
repository that you're working on.

It's really hard to describe how the distributed model looks like if
you've only ever seen the centralized model. Once you've gone
"distributed" though, I'm positive you won't go back -- much like when
you go black, you won't...

Part of changing the toolset is changing the process as well, which
you can only do if your tools allow you to make these changes. If you
want to have a more scalable and decentralized system (actually, I
haven't seen a scalable centralized system as well, that might be a
misnomer if there ever existed a scalable centralized solution) then
you're going to have to change both the tools and the process.

HTH

> John.
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

-- 
Dean Michael Berris
about.me/deanberris

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk