Boost logo

Boost :

Subject: Re: [boost] Change to guidelines for characters in C++ source files
From: Beman Dawes (bdawes_at_[hidden])
Date: 2015-06-26 10:26:55


On Fri, Jun 26, 2015 at 5:24 AM, Sebastian Redl <
sebastian.redl_at_[hidden]> wrote:

> On 26.06.2015 00:15, Paul Mensonides wrote:
>
>> On 6/25/2015 7:12 AM, Beman Dawes wrote:
>>
>> It looks to me like it is high time to change the Boost guideline for C++
>>> source file encoding to 7-bit ASCII without BOM or UTF-8 with BOM, and to
>>> change the inspect program accordingly.
>>>
>>> Comments?
>>>
>>
>> BOM is evil.
>>
> The Microsoft compiler will treat files without a BOM as encoded in its
> local codepage, with no way to override. If you want MSVC to read the
> source as UTF-8, you need a BOM.
>
> They have no plans to change this either, see
> https://connect.microsoft.com/VisualStudio/Feedback/Details/888437 . The
> bug is closed as wontfix.
>
> "Unfortunately, we currently have no plans to implement the support of
> UTF-8 files without byte order marks."
>
> Thus, we need a BOM in our source files if they contain UTF-8. That's just
> a sad fact.
>
>
It isn't just Microsoft. My first draft of N3463, Portable Program Source
Files http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3463.html
specified UTF-8 without BOM, but when a draft was circulated several
non-Microsoft compiler writers from the committee's core working group
explained that to them the BOM was essential.

The scenario they were concerned with was environments in Asia where the
default encoding is commonly not UTF-8 and most files that go into a
translation unit are encoded in that default encoding, but one file is
encoded in UTF-8 without a BOM. The compiler needs to be able to identify
that file as UTF-8 without all files be UTF-8 encoded, and the compiler
writers believe that is not possible 100% of the time without a BOM.

Some compilers or IDEs, including Visual Studio do have an opt-in option
"Auto-detect UTF-8 encoding without signature", but Boost can't count on
such an option being turned on.

I wasn't present in core when N3463 was discussed, but the unofficial
feedback I got was that CWG saw no need to explicitly state an
environmental feature as a requirement that users forced required all
compilers to support anyhow.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk