Boost logo

Boost :

Subject: Re: [boost] [git] Mercurial?
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-03-22 20:05:57


on Thu Mar 22 2012, Martin Geisler <mg-AT-aragost.com> wrote:

> Dave Abrahams <dave_at_[hidden]> writes:
>
>> on Thu Mar 22 2012, Martin Geisler <mg-AT-aragost.com> wrote:
>>
>>>> This kind of mental model is stimulated by git. It explains why git
>>>> users make a fuss about amending, rebasing and efficient branching
>>>> and merging.
>>>
>>> I'm afraid I don't agree with this. The version control systems that
>>> came after CVS switched to storing repository-wide snapshots. CVS was
>>> a collection of RCS files and so completely file-centric. SVN,
>>> Mercurial, Git, ... are all repository-centric. Conceptually they
>>> work by storing a series (linear or not) of working copy snapshots.
>>>
>>> Darcs is a possible exception to this: I think it might fit more
>>> closely to your unit-of-work model since it models the repository
>>> state as a result of a number of patches (units-of-work).
>>
>> People get hung up on this all the time, but whether the storage
>> format is fundamentally snapshots or diffs is not really important.
>> They're isomorphic to one another.
>
> Yes, that's mostly true. Both Git and Mercurial conceptually stores
> snapshots of your working copy. These snapshots are of course heavily
> delta compressed -- otherwise we couldn't do DVCS in the first place.
>
> But I say that they conceptually store snapshots since that's what
> commands like 'hg diff' operate on. When you 'hg diff -r 1:2', then
> Mercurial has to go out and re-compute the patch that brings you from
> revision 1 to 2.
>
> People often think that changeset 2 "contains" this diff,

In the sense that the commit contains its ancestry and a snapshot, yes,
it does contain the diff.

> but it's more complicated that this: the deltas we store on disk don't
> correspond directly to this diff. This especially true for a merge
> changeset where the deltas on disk are made on a per-file basis
> against the parent that produces the smallest delta.

But "deltas stored on-disk" are completely irrelevant to the user unless
she's fiddling about with the porcelain (low-level guts of the DVCS).
Even for most expert users, it is *always, always, always* an
implementation detail. My point is that we shouldn't be talking about
this stuff here; it will just confuse the less-experienced people and
adds /nothing/. One of the big problems with the way Git is often
explained is that the explainers get into this stuff. Can't speak for
Mercurial.

There's absolutely no difference conceptually between storing the latest
state plus a chain of diffs and storing a bunch of snapshots, except for
performance issues like how long it takes you to reach a given snapshot
or how much space things take up on disk, and every good VCS does all
kind of implementation-detail-y tricks to smooth out the deficiencies of
its storage format.

>> Git, like any other modern tool, provides commands that support both
>> views of the history. Rebase, for example, treats your commits as
>> units-of-work and "replays" that work on a new base commit. Many other
>> elements of the interface treat commits like snapshots.
>
> It actually don't replay anything: it does a series of three-way merges,
> and three-way merges are an inherently snapshot based thing.

I wouldn't say that. If A is the ancestor of B and C a 3-way merge can
be done by taking the diffs B-A and C-A, finding the overlapping regions
and marking those as conflicts, and applying the remaining diffs one by
one. The line numbering of those diffs needs to be adjusted as you go
along. At least, that's how I coded it 15 years ago in MPW (I was using
these tools:
http://www.mactech.com/articles/mactech/Vol.05/05.09/SADEDebugging/index.html).
And it worked perfectly.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk