Boost logo

Boost :

Subject: Re: [boost] [locale] review
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-18 17:12:54


> From: "Stewart, Robert" <Robert.Stewart_at_[hidden]>
> >
> > > Use of strings rather than symbols:
> > > there are a few places where the library
> > > uses strings such as "UTF-8" or "de_DE.UTF8"
> > > rather than symbols like utf_8 or de_DE_UTF8.
> > > This use of strings can make things run-time errors
> > > that could otherwise be compile-time errors.
> > > Mis-typing e.g. "UTF_8" or "de-DE" is too easy.
> > > Perhasp in some cases the domain of the parameter
> > > is not known at compile time, but even in those cases,
> > > it should be possible to provide symbols for the most
> > > common choices.
> >
> > What most common choices? There are few dozen of different
> > character encodings, there are even more locales, and what
> > considered common?
> >
> > Also not all encodings are supported by all backends. For example
> > iconv would not handle some windows codepages and MultiByteTo..
> > would not handle some other encodings.
> >
> > Locales, Is de_DE.UTF-8 common? Is he_IL.UTF-8 common?
> > Is zh_CN.UTF-8 common?
> >
> > Also the level of support by different backends may depend
> > on actually OS configuration - if some locale is not configured
> > on Windows or Linux that non-ICU backends would fallback to
> > the C/POSIX locale.
> >
> > So should there be hard coded constants for locales and encodings?
>
> If there is any runtime cost associated with the string representation,
> could you use a type to represent the encoding? The idea being that one
> could instantiate the encoding object from a string and the constructor
> could throw an exception to indicate an unsupported encoding.
> Then, one can reuse the encoding object thereafter with no further runtime
>cost.
> Thus, APIs would expect an encoding object, not a string, but if the encoding
> constructor is not explicit, the effect would be the same.

I'm not sure I fully understand you but...

Actually the encoding is constant for each locale object and
it is usually knows to to handle it efficiently.

For example in ICU backend there is a special class that
handles conversions from locale's encoding to UTF-16 -
internal ICU's encoding.

In any case the best practice is to use one encoding
over all your code base (UTF-8) and the library
is optimized especially for it.

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk