Boost logo

Boost :

Subject: Re: [boost] [locale] review
From: Stewart, Robert (Robert.Stewart_at_[hidden])
Date: 2011-04-19 07:29:26


Artyomj wrote:
> > From: "Stewart, Robert" <Robert.Stewart_at_[hidden]>
>
> > > > Use of strings rather than symbols:
> > > > there are a few places where the library
> > > > uses strings such as "UTF-8" or "de_DE.UTF8"
> > > > rather than symbols like utf_8 or de_DE_UTF8.
> > > > This use of strings can make things run-time errors
> > > > that could otherwise be compile-time errors.
> > > > Mis-typing e.g. "UTF_8" or "de-DE" is too easy.
> > > > Perhasp in some cases the domain of the parameter
> > > > is not known at compile time, but even in those cases,
> > > > it should be possible to provide symbols for the most
> > > > common choices.
> > >
> > > What most common choices? There are few dozen of different
> > > character encodings, there are even more locales, and what
> > > considered common?
> > >
> > > Also not all encodings are supported by all backends. For example
> > > iconv would not handle some windows codepages and MultiByteTo..
> > > would not handle some other encodings.
> > >
> > > Locales, Is de_DE.UTF-8 common? Is he_IL.UTF-8 common?
> > > Is zh_CN.UTF-8 common?
> > >
> > > Also the level of support by different backends may depend
> > > on actually OS configuration - if some locale is not configured
> > > on Windows or Linux that non-ICU backends would fallback to
> > > the C/POSIX locale.
> > >
> > > So should there be hard coded constants for locales and
> > > encodings?
> >
> > If there is any runtime cost associated with the string
> > representation, could you use a type to represent the
> > encoding? The idea being that one could instantiate the
> > encoding object from a string and the constructor could
> > throw an exception to indicate an unsupported encoding.
> > Then, one can reuse the encoding object thereafter with no
> > further runtime cost.
> >
> > Thus, APIs would expect an encoding object, not a string,
> > but if the encoding constructor is not explicit, the effect
> > would be the same.
>
> I'm not sure I fully understand you but...

There was reference to encoding constants and to strings to describe encodings. There was a desire for more constants to represent the "common" encodings. You noted the difficulty in determining which were the common encodings to know which constants to provide.

My suggestion was, if it fits the library, to create a class representing the encoding. The constructor would accept a string and could validate that the string represents a valid encoding. Your APIs that need an encoding can, then, expect an encoding object rather than a string to describe the encoding. Rereading the section at the top, above, I see that the OP wanted to get compile time errors. I presume that just a request that your APIs not accept strings but encoding objects as arguments, but that doesn't address validation of strings when that flexibility is needed, since such validation must be done at runtime.

With my suggestion, encoding validation is done in just one place: the encoding class' constructor. Conversely, the rest of your API need do no validation (presuming it does so, given that strings are accepted in at least some places). An encoding class also means that library users can create their own encoding constants by simply creating const instances of the encoding class to use throughout an application, relieving you of the burden to create the "right" set of "common" encoding constants.

The last point I was making is, if the encoding class' constructor is not explicit, then users can call your APIs with strings and the compiler will implicitly instantiate an encoding object to pass to the called functions.

> Actually the encoding is constant for each locale object and
> it is usually knows to to handle it efficiently.

I'm not sure how your locale objects factor in this discussion about encoding constants, but I'll leave that for you to determine. (I have never done i18n or l10n, so I haven't even read the documentation to understand how these may play together. I just thought I might be able to suggest something to address the OP's concern. However, my ignorance may be causing me to speak out of turn. If so, I'm sorry for the noise.)

> For example in ICU backend there is a special class that
> handles conversions from locale's encoding to UTF-16 -
> internal ICU's encoding.
>
> In any case the best practice is to use one encoding
> over all your code base (UTF-8) and the library
> is optimized especially for it.

Perhaps this is a terminology thing. Here you revert to "encoding" after your earlier mention of "locale." Perhaps one simply installs a global locale and that establishes a global encoding and no other encoding is referenced in the normal case. However, that seems at odds with the existence of any encoding constants, so I'm left confused. (Don't worry about easing my confusion; I just mean that it still seems like encoding constants are warranted and that my suggestion might be useful as a consequence. If not, just ignore me and get on with more important things!)

_____
Rob Stewart robert.stewart_at_[hidden]
Software Engineer using std::disclaimer;
Dev Tools & Components
Susquehanna International Group, LLP http://www.sig.com

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk