Boost logo

Boost :

Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-15 14:50:47


On 15/12/2010 18:50, Artyom wrote:

> Few notes or questions, you say that your library is locale agnostic,
> I see a contradiction between what you say and what you need to implement

My personal belief is that the locale matters for few things, and it's a
big burden to set up and manage.

So if I can avoid having to choose one, I'd rather do that, and only
specify one when I really need it.

> 1. AFAIK boundary analysis is locale dependent.

Tailoring of break properties is not supported: the default values are used.
The specification in question (UAX #29) barely mentions tailoring anyway.

A possibility to achieve a locale-dependent behaviour here would be to
swap the database with a tailored one.

> 2. case conversion - is locale dependent - for example if the locale is Turkish
> then upper("i")=="Ä°" while upper("i")="I" for other languages.

Simple case conversions are the easy 1:1 language- and context-agnostic
mappings.

I can't do the more complex conversions because they depend on specific
languages and contexts.

Thankfully case folding is not language- nor context-dependent, and is
probably what most people want rather than case conversion.

> 3. collation - **is** locale dependent as text sorting in different languages
> is very different - even if they use same script (Latin for example)

Yes, it definitely is; but you could still have a "general" collation
that would work well enough for most languages.

I said it in 'maybe', but I had forgotten how complicated the official
algorithm was. So I won't do the collation support before a while.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk