|
Boost : |
From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2004-04-11 21:38:36
It seems that Unicode support in Boost (which could lead to Unicode
support in the C++ language and standard library) would be quite
desirable.
The IBM International Components for Unicode (ICU) library
(http://oss.software.ibm.com/icu/) is an existing C++ library with what
appears to be a Boost-compatible license, which provides all or most of
the Unicode support that would be desired in Boost or the C++ standard
library, in addition to Unicode-equivalents of libraries already either
in the standard library or in Boost, including number/currency
formatting, date formatting, message formatting, and a regular
expression library. Unfortunately, it does not use C++ exceptions to
signal exceptional conditions (but rather it uses an error code return
mechanism), it does not follow Boost naming conventions, and although
there are some C++-specific facilities, most of the C++ API is the same
as the C API, thus resulting in a less-than-optimal C++ interface.
Nonetheless, I think Boostifying the ICU library would be quite
feasible, whereas attempting to reimplement all of the desired
functionality that the ICU library provides would be extremely
time consuming, since the collating and other services in the ICU
library already support a large number of locales, and the
character-set conversion facilities support a large number of character
sets.
The representation of locales does present an issue that needs to be
considered. The existing C++ standard locale facets are not very
suitable for a variety of reasons:
- The standard facets (and the locale class itself, in that it is a
functor for comparing basic_strings) are tied to facilities such as
std::basic_string and std::ios_base which are not suitable for
Unicode support.
- The interface of std::collate<Ch> is not at all suitable for
providing all of the functionality desired for Unicode string
collation. A suitable Unicode collation facility should at least
allow for user-selection of the strength level used (refer to
http://www.unicode.org/unicode/reports/tr10/), and would ideally
also support customizations as extensive as the ICU library does
(refer to
http://oss.software.ibm.com/icu/userguide/Collate_ServiceArchitecture.html
and
http://oss.software.ibm.com/icu/userguide/Collate_Customization.html).
- Facilities such as Unicode string collation are heavily data-driven,
and it would be inefficient to load the data for facilities that are
not used. This could be avoided by using some sort of lazy loading
mechanism.
It would still be possible to use the standard locale object as a
container of an entirely new set of facets, which could be loaded from
the data sources based on the name of the locale, and ``injected'' into
an existing locale object, by calling some function. It is not clear,
however, what advantage this would serve over simply using a
thin-wrapper over a locale name to represent a ``locale,'' as is done in
the ICU library.
-- Jeremy Maitin-Shepard
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk