Boost logo

Boost :

Subject: Re: [boost] Git Modularization Ready for Review
From: Alexander Lamaison (awl03_at_[hidden])
Date: 2013-05-10 14:05:40


Niall Douglas <ndouglas_at_[hidden]> writes:

>> Niall Douglas <ndouglas_at_[hidden]> writes:
>>
>> > Me personally I'd just chuck away any unmappable historical information.
>> > Chalk it up to a cost of transition. If they really need the history,
>> > they can go fetch it from the read-only SVN repo.
>>
>> I see you've not been keeping up with the list lately ;) Daniel et. al.
>> suggested doing just that a few months ago and was met with such a chorus
> of
>> criticism, they didn't really have a choice but to fix it.

snip

>> Personally, I agree with the chorus. After all, the point of a VCS is to
> have a
>> history of the code's evolution to a point. The VCS, be it SVN, Git,
> whatever, is
>> just a means to get that history. Jetissoning the ends for the means
> seems
>> misguided.
>
> No, it's a cost of doing an upgrade. Those of you who ever migrated a large
> CVS to SVN transition know what I mean: stuff gets lost, and actually it
> isn't important enough to preserve that it requires a quadrupling of
> transition effort when a read-only copy of the old technology repo is good
> enough. Distributed SCM is much more dimorphic again, and you *have* to
> accept some data loss.

It's a question of degree. Daniel et. al.'s recent work shows that you
can keep that loss pretty low. "Where there's a will there's a way?"

snip
>> What happens in a few years time when Git is replaced with the next big
> thing?
>> Do we lose the history again? And then again when that gets replaced too?
>
> That's exactly what happens. Bitrot is always inevitable in the long run.
> Here call it "non-fatal bitrot" :)
>
> My only red line is corruption of past and present source code releases. It
> must *always* be possible to check out tag X and build release X.

So why have a VCS at all? Named tarballs meet your red line.

> Other than
> that, I'm flexible, including loss of branch integrity, because in the end
> if that branch is really important its owner will manually fix up the
> damage.
>
>> > I fear that if modularization is taken to its logical extreme, you
>> > could see submodules get out of step with other submodules. You may of
>> > course have already anticipated this and have implemented something I
>> haven't realized.
>> > As I said, I am confused.
>>
>> Can you explain a bit more about what you mean by out-of-step? The whole
>> point of modularising the code is to *help* modules to get out-of-step and
>> therefore be easier to develop and test independent of what other Boost
>> libraries are doing. But perhaps you mean something else?
>
> Well, the Git way of helping stuff get out of step is that everyone gets
> their own full copy of the whole repo, and their copy is just as important
> as anyone else's copy. You clone, you develop and test, and optionally push
> changes elsewhere which could be your friend, your team, your employer, or
> of course some central authoritative copy.

That works for a monolithic project. Boost is not monolithic but it's
single repo model was forcing it to act like one. As well as making
life a little easier for library developers, the modularisation effort
should help end users too. For a while now people have been wanting to
take only what they need from Boost and ignore the rest. It's
especially important in commercial environments where manager (rightly
or wrongly) are reluctant to depend on huge quantities of unreviewable
code with unknown interactions. Modularisation won't solve the problem
overnight but it helps.

> So I'm afraid I just don't get the present design for a library as
> small and as tightly integrated as Boost.

You must be talking about some other Boost.

> Something huge and cleanly separated like KDE sure, but for Boost I
> suspect it's overkill. Unless Boost plans to grow 10x in the next three
> years that is.
>
>> > [1]: By findable I mean that when Boost library users do #include
>> > <boost/whatever> they get the main Boost repo version, not the
>> > submodule version. I absolutely would expect an automated tool to pull
>> > headers from submodules, check them for ABI breakage and push them
>> > into the main repo. My point is that some sanity check ought to be
> there.
>>
>> I'm not following why you would want to do this. Perhaps you can explain
> what
>> problem you are anticipating and how this would solve it?
>
> Most of Boost is implemented in headers, very much unlike KDE or most other
> C++ libraries. Moreover, those headers are quite brittle, unlike KDE or most
> other C++ libraries. If broken into submodules, I can see an apparently
> innocent change in submodule X appearing to compile and be okay in developer
> X's set of submodule clones, but silently be a breaking change with a
> simultaneous change in submodule Y by developer Y.

I think the idea is that it allows you to mix your own
boost release by choosing the set of submodules to combine. The main
boostorg set is the official mix but not the only one. If the libraries
were just developed as clones on the boostorg repo, you would be forced
to choose the exact set of library version that one particular library
developed against.

> Why this matters is because when in git you go to push, git will force you
> to merge before push and at that point you "see" the breakage by a conflict
> appearing (hopefully) and if not then your immediate next compile will fail.
> With the submodule approach that doesn't happen, so you *don't* see the
> breakage till much later when the regression tests suddenly start
> failing.
>
> I'm a great believer in refusing to let programmers commit code which breaks
> other code rather than nagging them later to fix an earlier commit. The
> point of failure notification ought to be as close to cause as
> possible.

Surely the developer sees the breakage the moment they update the
versions of the other libraries they are working against. And then they
can fix it at their leisure while early adopters get to use their code
with the previous version of the conflicting library.

I can't find the details written down anywhere but I vaguely remember
there is a way the developers are supposed to signal that a particular
version is suitable for use with the master libraries or something like
that. It would be good to see this in black and white.

>> I also don't get what 'findable' means. What would a non-findable header
> be?
>
> Any internal header only findable by internal implementation. What I'm
> basically suggesting is an approach where the master repo keeps a gold
> candidate set of headers automatically extracted regularly from the
> submodules.

That's pretty much what happens. The boostorg repo will have the 'gold
candidate' of submodule hashes which are known to work together. I
don't get what 'extracting' the headers would add to that.

> Then on push a hook can do appropriate black magic to force the
> pusher to merge headers before the push. Then stuff which is broken appears
> broken as soon as possible, rather than suddenly emerging many days
> later.

When you are talking about breakages are you talking about merge
conflicts or compile/test-failures? Up till now I've assumed you mean
the latter. But now I'm thinking you might be asking what happens if
the developer of library X makes a change to header H and developer Y
also makes a change to _the same header_?

The way the modularisation works, I'm not even sure this is possible.
The libraries are meant to be self-contained so you only ever change
your own headers. It's a good question and one I hand't thought about.
Perhaps the people working on this can weigh in here.

Alex

-- 
Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk