Boost logo

Boost :

Subject: Re: [boost] [git] Mercurial?
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2012-03-21 11:56:08


(NB: it took me some time to write this post and in the meanwhile
some of the issues I'm addressing have been covered. Hopefully what I
wrote is still useful by making the central things very explicit.)

Rene Rivera wrote:

> On 3/20/2012 10:15 PM, David Bergman wrote:
>
>> We evidently have different styles of formal solving; mine is a
>> balance between an internal - or semi-internal - process and an
>> "accountable" collaborative effort. I do not see the value of
>> everybody seeing every single key stroke I make, as long as they see
>> certain sync points; actually, quite analogously to the operational
>> semantics of C++ - that certain points at the execution have to
>> follow some rules...
>
> Hm.. I must be not understanding something.. Are you arguing that not all commits/check-ins you do to a local/private repository are important enough to merit the benefits of collaboration? I ask because my contention is that if it's important enough for you to put something into a VCS history, it's important enough for you collaborators to inspect it.. for perpetuity. And that the sooner that inspection happens the better it is for everyone. Hence that deleting such history is counter to collaboration.

This already received several comments, but I think there is
something very deep about this that deserves more attention. It
revolves around the following fundamental question:

What is the meaning of a commit?

One possible interpretation is that a commit is a snapshot of your
project. A snapshot is something that you store for future reference.
Because in a sense it's a form of documentation, one will take care
to submit well-crafted commits that include enough useful changes to
license a new snapshot. In principle, every commit is assumed to
introduce some form of progress compared to the previous. Making
changes to such a history of snapshots is almost necessarily a form
of fraud.

This is the kind of mental model of a commit that is stimulated by
svn. You can see it from the terminology: making a commit causes the
repository to move to the next revision number.

Another possible interpretation is that a commit represents a unit of
work. This tends to favour many small commits over few big commits.
Since anything you do before you're sure that it's the right thing is
also work, shabby commits are part of the deal. The consequence is
that it must be very cheap to isolate any messy state in temporary
side tracks. Now the VCS is not only a collection of snapshots, but
also a tool to manage your recent pieces of work before you finally
commit* to some of them.

This kind of mental model is stimulated by git. It explains why git
users make a fuss about amending, rebasing and efficient branching
and merging.

There is no point in arguing that one mental model is superior to the
other until you fully grasp both of them. I urge anyone who feels
tempted to make agitated remarks to let this sink in for at least a
few hours.

That said, I'm confident enough to think that I can give two solid
arguments why the units-of-work model is ultimately more productive.

The first argument is provided by historical evidence, and nicely
illustrated by Christof Donat's most recent post in this thread. The
units-of-work model was first: local VCSs of the early generation
invited developers to commit often. In the centralised VCSs of the
next generation, committing became too expensive for such a workflow
and developers adapted to the snapshot model instead. From that
perspective the snapshot model was a workaround rather than a
preferred solution. Distributed VCSs of the current generation
specifically intend to address that problem by making commits cheap
again. Developers are now using the opportunity to switch back to the
units-of-work model.

The second argument is more technical, and perhaps more convincing.
It works even without branches or collaborators. All we need is a
single developer who makes some changes to their working copy of a
project.

1. If the developer applies the snapshot model, they'll implement all
changes in one go and spend some time to verify that they seem to
make sense. After that they'll probably make a commit.
2. If the developer applies the units-of-work model, they'll commit
each change directly after implementing it. Let's say five commits
are made in total.

A little later, our developer finds that they want to undo one of the
changes.

1. In de snapshot scenario, they look up the pieces of code that were
affected by the faulty change and edit them again.
2. In the units-of-work scenario, they cut the faulty commit out of
history and they're done.

Result: the units-of-work developer is spending less time to get the
same thing done with less opportunity for errors.

Note that the pieces of history that tend to get altered in a units-
of-work model generally don't make it into version control in a
snapshot model at all.

-Julian

----
*) Commit as in, make a commitment. Pun not entirely unintended.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk