Boost logo

Boost :

Subject: Re: [boost] [git] conversion stalled
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2013-10-09 18:23:46


On 9 Oct 2013 at 22:49, Daniel Pfeifer wrote:

> > I see that .gitattributes now contains explicit eol and mime type
> > mappings for every file in the repo. I advised you against that - I
> > think it will make conversion performance slow from all the lookups,
> > never mind poor old git trying to parse that map on every file
> > checkout.
>
> The commit that added the file is labeled "Add Niall Douglas'
> .gitattributes file".
> Care to repeat what you did advise exactly?

My original .gitattributes file was a set of file extension wildcards
exactly matching the Boost wiki page describing SVN auto-props. This
ensured that when checking out text formats, you got native EOLs or
Unix EOLs as is correct for that type of text.

The problem which then emerged is that many files have been
historically committed to Boost years ago with all sorts of weird and
unconventional EOLs and UTF variations e.g. there are some files in
sandbox which have intentionally malformed EOLs as part of its unit
tests, something which would break if committed as non-binary to git.
There is also quite a bit of UTF-16 text, especially in ancient
revisions before UTF-8 became popular. Git has no understanding of
UTF-16 text, so text is either UTF-8 or it's a binary.

My original advice was that unintentionally weird text formats needed
fixing up for git correctness i.e. UTF-8 throughout with EOL 10. I
hence sent a patch to Boost2Git which scanned text format files for
bad EOLs, and it did repairs in flight. This seems to have worked,
but it advertently repaired intentionally weird text formats.

My original advice was that intentionally weird text formats need a
better file extension than .txt e.g. .bin. That said, I accepted
Dave's argument that this causes breakage in ancient revision
checkouts, but I still would argue that if people really need ancient
revisions working properly, go use a legacy SVN repo.

Dave I think decided that every text file needed listing in
.gitattributes with its EOL style, and the .gitattributes file needs
permuting every commit because as commits pass some text files will
change from UTF-16 EOL 13,10 through Latin1 EOL 13,10 to UTF-8 EOL 10
etc so a static .gitattributes would still introduce corruption. This
is what is needed if you want any possible past revision checkout to
be a perfect representation of the SVN repo. I'd imagine Boost2Git
will also need a map of which text files are intentionally malformed
and must be treated as binary over which range of SVN revision
numbers. Building that map, I would imagine, will involve a lot of
human hours.

As this list knows, I think all this work being done for free is
excessive. If people really, really want all possible ancient
revisions to work, they should pay the contracting hourly rate to
people to implement it. Expecting all this for free is in my opinion
daft. A git conversion of Boost ought to work reasonably well for the
past three years of checkouts, past that it should be for
illustration of history only.

Niall

-- 
Currently unemployed and looking for work.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk