Boost logo

Boost :

From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2004-04-14 06:52:58


Jeremy Maitin-Shepard <jbms_at_[hidden]> writes:

> What I am saying is that operations such as "convert to uppercase" on
> Unicode strings are locale-independent, and thus such operations need not
> and should not be part of the locale interface.

In which case you are wrong. The SpecialCasings.txt file from the Unicode data
file set identifies locale-specific case conversions, such as:

# Turkish and Azeri

# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.

# Remove spurious dot above small i's when lowercasing, if there are no more
# accents above:

0307; ; 0307; 0307; tr AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE
0307; ; 0307; 0307; az AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE

# Fix case pairs

0049; 0131; 0049; 0049; tr; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I

0049; 0131; 0049; 0049; az; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

In fact, as the sample shows, not only is case conversion locale-dependent,
but it is context-dependent too --- the conversion of a character depends on
the preceding characters.

Anthony

-- 
Anthony Williams
Senior Software Engineer, Beran Instruments Ltd.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk