Boost logo

Boost :

Subject: [boost] [locale] Review of Boost.Locale library
From: Edward Diener (eldiener_at_[hidden])
Date: 2011-04-17 13:03:25


This is my review of the Boost Locale library. It is divided into two
parts, a review of the documentation and a review of the library itself.
I will refer to Boost Locale as just Locale, with a capital L, for the
remainder of this review. I will use the term C++ locale to refer to the
C++ standard library locale implementation.

1. Documentation

The layout of the main page is decent, but I would have expected a
discussion there, or as a first topic, of what Locale brings that the
C++ standard locale does not have. I was disappointed not to find such a
discussion.

The documentation topics are specified as Tutorials. I do not think they
are Tutorials, which is fine with me since I much prefer topics rather
than exmaples when trying to understand a library.

a. Introduction to C++ Standard Library localization support

The common critical problems of the C++ locale portion of the standard
library seems spurious to me. The problems mentioned are really about
implementations or programmer usage, not the C++ locale library itself.
The only valid problem mentioned I find there is that the C++ standard
library did not attempt to standardize any locale names. This makes
using C++ locales based on locale names non-portable.

Unfortunately the issues there make a very weak argument for the Locale
library itself.

b. Locale generation

I would have liked it if the doc here specified where one finds valid
lists of language, country, encoding, and variant which make up a locale
name. Without this information, the one valid problem mentioned
regarding C++ locale is also a problem with Locale.

The note about wide strings and 8-bit encoding makes no sense to me at
all. If I am using a wide string encoding, why would I not be using wide
string iostreams ?

c. Collation

There is no explanation about what 'collation' is about. This is very
disappointing, as it makes the rest of the discussion difficult to follow.

The examples were worthless to me since the classes involved have not
been mentioned or discussed. Also the examples are woefully incomplete
even in what they represent.

This is one reason why I dislike documentation which attempts to teach
by example. It always seems to assume that if it throws examples at the
reader before anything about the classes/templates in the examples have
been mentioned, that this is somehow an effective way of learning a
library. Instead it just creates confusion and serves unfortunately as a
way by which a library implementer does not have to explain how the
classes in his library actually work or relate to each other.

d. Conversions

"You may notice that there are existing functions to_upper and to_lower
in the Boost.StringAlgo library. The difference is that these function
operate over an entire string instead of performing incorrect
character-by-character conversions."

In he second sentence, "these function" gramatically refers to the
functions in Boost.StringAlgo, but I doubt that is what is meant.

I do not understand how these conversion functions use a locale. The
example gives: boost::locale::to_upper(gruben) used in a stream. Is this
function using the locale imbued in the iostream ?

Again this is what happens when one creates examples without first
explaining topics in a rational and orderly way.

e. Numbers, Time and Currency formatting and parsing

A bunch of ICU flags are mentioned but with no indication about how
these are supposed to be used by iostreams. These flags look like they
are supposed to be used by C-like format printf statements but since
Locale uses iostreams I can not understand their purpose with Locale.

f. Messages Formatting (Translation)

Gnu gettext should be explained when it interfaces with Locale. Just
telling someone to learn Gnu gettext is not adequate. Other than that
the explanation is pretty thorough.

g. Character Set Conversions

An explanation of what character sets are, and what character set
conversions entail, should be the beginning of this documentation.

h. Localized Text Formatting

"Each format specifier is enclosed within {} brackets.."

These are not "brackets" but "braces". Brackets are '[]'.

i. In general

It is confusing to me how generated locales affect the functionality of
the different sections presented under 'Using Boost.Locale'. In a number
of situations I am looking at classes or functions and I have no idea
how these pick up a locale. I do understand that when used with
iostreams the locale is determined by the locale imbued in the iostream.
But outside of iostreams I do not understand from the documentation what
locale is being used. If it is the C++ global locale, the documentation
should say so. This entire issue about how locales are actually being
used in various parts of the library should be explained as part of an
overall explanation of the library. I find this good overall explanation
of the library the major flaw in the documentation.

The documentation itself is well-ordered and the explanations generally
decent. But I find it next to impossible to understand a library when
the documentation does not take the time to explain concepts/topics and
instead substititutes exmaples as a means of understaning a library.

2. Library itself

I did not look at the code itself and have no interest in critiquing the
source. Others do this much better than I ever can.

The library offers a great deal of functionality and a great positive of
the library is that it works with multiple backends and brings those
backends into the C++ world.

In general I think that using the global locale is a bad programming
practice when one specifically intends to work with locales.
Unfortunately it was hard for me to understand how individual locales
are used with each of the parts of the library from the documentation.
But I will assume for the time being, because it seems the only correct
design, that each part of the library which is documented can work with
some non-global locale which is created and passed around as necessary.

The only design flaw which I could discover in the library was in
message translation. The fact that translation always begin from English
( or perhaps some other narrow character language ) to something else is
horrendous. I can understand that the Locale implementer wanted to use
something popular and that already exists, but an idea so tremendously
flawed in its conception either needs to be changed, if possible, or
discarded for something better. I do understand that translation is just
one part of this large library, but I hope that the implementer
undestands how ridiculous it is to assume that non-English programmers
are going to be willing to translate from English to language X rather
than from their own language to language X.

I am assuming that all other parts of the library support both narrow
character encodings and wide character encodings fully, and that at
least UTF-16 is always supported using wide characters. It was really
hard for me to make out from the docs whether or not this was the case.

I believe a great deal of work was put into the library, and that this
work is invaluable in bringing locale usage into C++ in a better way
than it is currently supported in C++ locale.

But I would have to vote "No" that the library should be accepted into
Boost at the current time, with some provisos which would most likely
gain my own change to a "Yes" vote in the future.

1) The documentation should explain the differences and improvements of
Locale over C++ locale in a good general way.

2) The documentation should explain the use of locales for each of the
topics.

3) A number of topics should discuss what they are about in general, and
more time should be given to discuss how the classes/templates relate to
each other.

4) Message translation should be reconsidered. I don't mean to say that
the way it is done is enough to have the library rejected, but I can not
believe that it is not a flawed system no matter what the popularity of
Gnu gettext might be.

My major trouble with the library, which has led to my "No" vote, is
that I can not really understand how to use the library from the
documentation or what the library really offers over and above C++
locales. I realize the library programmer may not himself be a native
English speaker, but if I can not really understand a library from its
documentation in a way that makes sense to me I can not vote for its
inclusion into Boost. I strongly suspect that if I were to understand
the functionality of the library through a more rigorous explanation of
its topics, and each topics relationship to a locale class and various
encodings, I mught well vote for its inclusion into Boost. But for now,
and in the state which the documentation resides for me, I can not do
so. So I hope this review will be undersstood at least partially as a
request to improve the docs as much as anything else.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk