Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-15 14:50:47
On 15/12/2010 18:50, Artyom wrote:
> Few notes or questions, you say that your library is locale agnostic,
> I see a contradiction between what you say and what you need to implement
My personal belief is that the locale matters for few things, and it's a
big burden to set up and manage.
So if I can avoid having to choose one, I'd rather do that, and only
specify one when I really need it.
> 1. AFAIK boundary analysis is locale dependent.
Tailoring of break properties is not supported: the default values are used.
The specification in question (UAX #29) barely mentions tailoring anyway.
A possibility to achieve a locale-dependent behaviour here would be to
swap the database with a tailored one.
> 2. case conversion - is locale dependent - for example if the locale is Turkish
> then upper("i")=="Ä°" while upper("i")="I" for other languages.
Simple case conversions are the easy 1:1 language- and context-agnostic
I can't do the more complex conversions because they depend on specific
languages and contexts.
Thankfully case folding is not language- nor context-dependent, and is
probably what most people want rather than case conversion.
> 3. collation - **is** locale dependent as text sorting in different languages
> is very different - even if they use same script (Latin for example)
Yes, it definitely is; but you could still have a "general" collation
that would work well enough for most languages.
I said it in 'maybe', but I had forgotten how complicated the official
algorithm was. So I won't do the collation support before a while.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk