Boost logo

Boost :

Subject: Re: [boost] [modularization] Modularizing Boost (modularization)
From: Bjørn Roald (bjorn_at_[hidden])
Date: 2013-11-03 14:01:24


On 11/01/2013 10:37 AM, Stephen Kelly wrote:
> On 11/01/2013 10:10 AM, Bjørn Roald wrote:
>> On 11/01/2013 09:21 AM, Stephen Kelly wrote:
>>> On 10/18/2013 12:24 AM, Stephen Kelly wrote:
>>>> If the dependencies between repositories are analysed, the result is
>>
>> snip...
>>
>>> I'm still at a loss to reason with pushing forward with git migration
>>> instead of making these changes which have demonstrated edge removal,
>>> are valuable and are are easy cheap and quick.
>>
>> Please give some short references or explanation about what you expect
>> to become harder to do after the migration.
>
> Working with git submodules adds layers of 'hardness'.
>
> 'git grep' does not pierce submodules for one thing. A lightweight
> ad-hoc use of it becomes a heavier script full of necessary tricks such
> as adding '|| true' and quoting problems.

Yes, it is a messy to deal with other than the simplest submodule
foreach commands, in particular it get very messy when some dynamic
input data is needed.

Since submodule foreach is to tricky, lack of submodule awareness in
some of the git commands is very annoying. The natural support feature
for submodule grep would be something like

    git grep --recursive <pattern>

But that is not supported. There is some talk about it here:
http://lists-archives.com/git/729379-submodule-aware-grep.html
but that was 3 years ago, and as far as I can tell this has not been
added in any way to git, so I think we need to figure ways to get around
without it and similar support in other some other git commands.

I tried the git-grep-submodule wrapper command from one of the suggested
patches in the above cited discussion, and I combined it with a simple
alias in my ~/.gitconfig file.

[alias]
     rgrep = "!f() { git grep $*; git grep-submodule $*;}; f"

The git-grep-submodule script need to be somewhere in your executable
PATH and have executable bit set. Then I can type:

    git grep -n <pattern>
    git grep-submodule -n <pattern>

or simply

    git rgrep -n <pattern>

to get both supermodule and submodule grep working

This works for me, so maybe it may be worth trying when you need to work
with git submodules.

A problem is that these commands are not standard, so they will create
problems when referenced in mailing lists etc. Maybe boost should have
git related utilities like this shared in some tools repository.

> For example, to acquire the numbers I pasted here:
>
> http://thread.gmane.org/gmane.comp.lib.boost.devel/245078/focus=245123
>
> I first tried to get the answer from the modularized repos.
>
> While I can do this:
>
> git log --oneline --author=steveire | read; echo $?'
>
> I could find no way of wrapping that in a git submodule foreach.

I see, I tried to make sense of why this works so poorly in git
submodule foreach. I was struggling with it for a while and gave up. My
limited shell voodoo does not grok this. I suspect the use of the /read/
builtin command play games with me. Maybe it is not the bash version of
/read/ that get invoked, or something else is the problem.

I learned enough from trying to wrap your git log command to try an
alternate approach that works for me. To get number of commits per
author per module, I defined a bash function in a file submodule_tools.sh:

commit_count_by_author(){
   if [ -z "$1" ] ; then
     echo -e "Missing <author> argument, email matching regex";
     return 1;
   fi
   export foo="git log --oneline --no-merges --author=$1";
   echo `basename $(pwd)` `$foo | wc -l`;
   git submodule foreach --quiet 'echo $name `$foo | wc -l`';
};

Notes on commit_count_by_author utility:

1. The export of /foo/ is needed to make it visible inside the
    single quoted command for submodule foreach which. Submodule
    foreach does not like having the command double quoted. In
    general the environment in the foreach command is very
    limited :-( and passing data and commands inn there is tricky,
    I was not able to pass the pipe as part of the $foo variable
    without messing up the interpretation of the command tokens
    which are all manged. I was not able to get $@ to work to keep
    command tokens separated and thus make this simple hack into a
    more general utility :-(

2. The /author/ parameter is a regex pattern used to match the
    email address that is recorded along with each
    commit, Use the full address or at least append the @
    character to prevent accidental match with multiple authors.
    I was not able to successfully use ^ and $ to limit the regex
    to force exact match, I don't know why, but it is probably due
    to missing quoting, escaping, or what not.

To list all modules with number of commits by Beman Dawes, where
merge commits are not counted, source and call the utility:

. tools/git/submodule_tools.sh
commit_count_by_author bdawes@
modular-boost 1696
accumulators 0
algorithm 7
any 13
array 24
asio 5
assign 3
atomic 0
bimap 3
bind 5
...

To strip off modules with no commits, and the count:

commit_count_by_author bdawes@ | grep -v " 0$" | cut -d ' ' -f 1

which is similar to your use case I think.

The notes above may explain some of the problems with the naive
foreach wrapping I tried first. Hopefully the notes and example may
help in other cases. But the best thing would be to find a simpler,
more robust and general way to wrap recursive commands for submodules.

> Git submodules make horizontal work (ie, the kind of work that I've been
> doing) harder.

I agree, we need to find ways to make it less so.

The basic work flow become complicated when working horizontally across
submodules with dependencies between the changes. If somebody know how
other projects deal with this, it would be good to know.

Here is how I see it, I assume here that a hot-fix branch named equally
for each module is the way to go:

* You need to take extra care to check out local hot-fix
   branch for each repository you start working on.

* When done with changes, commit and push the hot-fix branches to
   some sensible public place. This is not as straight forward as
   we could want it to be, although similar to other library
   contributions. It need to be done for each module you touch.

* Create pull requests as needed.

* Someone need to pull the hot-fix' branches into each of the
   official GitHub repositories, slow response here will block progress.

* Merging to master too early in one module may block other merges
   in that module, so that may not make sense.

* The release managers need to be made aware of any
   dependencies between develop branch commits across modules.

* How is it tested?

* I guess at some point it all need to be controlled by release
   managers for all affected libraries, i.e.: the release managers
   merge to master branch in some synchronized fashion.

People with special administrative roles need to get involved when such
commits need to be synchronized across multiple modules. I assume we
cannot expect the individual authors to respond to pull requests in a
timely way in these cases. Maybe some other branching and pulling
scheme could help. Any thoughts.

>> I guess if migration plans are to be reconsidered there may need to be
>> a clear understanding why the added work or complexity you are
>> implying is worth waiting for.
>
> This mail is just an explanation to your query, not asking for
> reconsideration.

Thanks for your reply and clarifications.

--
Bjørn



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk