Boost logo

Boost :

Subject: Re: [boost] [conversion] Isolating the phantom file changes problem
From: Beman Dawes (bdawes_at_[hidden])
Date: 2013-11-29 16:26:45


On Fri, Nov 29, 2013 at 12:26 PM, Niall Douglas
<s_sourceforge_at_[hidden]>wrote:

> On 29 Nov 2013 at 8:14, Beman Dawes wrote:
>
> > > When we talk about renormalisation, we're talking about the procedure
> > > described in the gitattributes man page.
>
> Unfortunately that procedure would corrupt many files within Boost.
>

I've tested, on Windows and Linux, without apparent problems. The most
files modified (152) are on master. Here is the list:

>
> > +1
> >
> > There is also a nice discussion at
> >
> https://help.github.com/articles/dealing-with-line-endings#re-normalizing-a-repository
> >
> > I've forked the boost super repo and am testing the procedure now.
>
> Off the top of my head, you'll need to watch for the following (this
> list is incomplete):
>
> * Files with text file extensions not in ASCII or UTF-8. If you use
> simple EOL renormalisation with UTF-16 text for example, you'll
> corrupt that text.
>

None of the modified files fit in that category.

> * Text files with intentionally mixed EOLs. You'll need to change
> their extension to not .txt (best), or add special exceptions to
> .gitattributes (brittle, I wouldn't recommend this option).
>

None of the modified files fit in that category.

I checked tools/inspect/wrong_line_ends_test.cpp to see why it wasn't
normalized, and the reason was simple. It had already been normalized! That
isn't worrisome - inspect is a tool that will need tuning for git anyhow.

>
> * Scan the first 8Kb of every file with an extension not marked as
> text nor binary in .gitattributes for zeros. If you don't find a
> zero, git will assume it is text and EOL normalise it. Unfortunately
> some binary file types such as PDF don't have zeros in their first
> 8Kb, so that would be very bad.
>

.pdf is in .gitattributes, so no problem. I checked a couple of .pdf files
to be sure, and adobe reader opens them without problems.

All of the modified files have extensions that are in .gitattributes, by
the way.

> We never dealt with these issues during conversion, and there are
> probably more we don't know about yet. This is why I said Boost is
> not ready to do the transition - plus too few want to do the manual
> labour involved in achieving a "perfect" conversion and just want
> "someone else" to do the tedious work for them.
>

I'm sure there are problems we don't know about. But we have done enough
testing do know that vast numbers of files were converted correctly, that
passing tests on both trunk and branches/release still pass, and that the
small number of the minor problems we have found are not even close to
being showstoppers.

To delay further just because of FUD will be harmful.

Thanks for your list of possible problem areas. It gave me something
additional things to look for.

--Beman




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk