Subject: [boost] Change to guidelines for characters in C++ source files
From: Beman Dawes (bdawes_at_[hidden])
Date: 2015-06-25 10:12:31
Since the very early days of Boost the guidelines for acceptable characters
in C++ source files has been the 96 characters of C++ standard's basic
source character set encoded in 7-bit ASCII. The inspect program also
allowed several additional 7-bit ASCII characters that sometimes appear in
The rationale was to ensure that Boost code was portable to all compilers
available at that time. We had gotten complaints that even a character as
innocuous as a copyright sign (U+00A9) was causing compiles to fail on some
compiler releases targeting Asian languages. UTF-8 support was far from
Times have changed:
* Source files encoded in UTF-8 with a leading byte order mark (BOM) of the
byte sequence 0xEF,0xBB,0xBF are supported by all C++ compilers that we are
aware of, and this has been true for many years now.
* As of C++11, the C++ language now includes types and literals directly
supporting UTF-8, UTF-16, and UTF-32, and creating code points above 7-bit
ASCII in such literals is much easier if UTF-8 source encoding is used.
Even editors as dumb as Windows Notepad have supported UTF-8 with BOM for
some time now.
* As Boost Libraries start to incorporate C++11 Unicode related features,
it becomes difficult to write test programs if limited to 7-bit ASCII. For
example, incorporating the Filesystem TS into Boost.Filesystem requires
test cases with UTF-8, UTF-16, and UTF-32 and that's painful under the
current 7-bit ASCII guidelines.
It looks to me like it is high time to change the Boost guideline for C++
source file encoding to 7-bit ASCII without BOM or UTF-8 with BOM, and to
change the inspect program accordingly.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk