|
Boost : |
From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2004-04-14 06:52:58
Jeremy Maitin-Shepard <jbms_at_[hidden]> writes:
> What I am saying is that operations such as "convert to uppercase" on
> Unicode strings are locale-independent, and thus such operations need not
> and should not be part of the locale interface.
In which case you are wrong. The SpecialCasings.txt file from the Unicode data
file set identifies locale-specific case conversions, such as:
# Turkish and Azeri
# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.
# Remove spurious dot above small i's when lowercasing, if there are no more
# accents above:
0307; ; 0307; 0307; tr AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE
0307; ; 0307; 0307; az AFTER_i NOT_MORE_ABOVE # COMBINING DOT ABOVE
# Fix case pairs
0049; 0131; 0049; 0049; tr; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0049; 0131; 0049; 0049; az; # LATIN CAPITAL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
In fact, as the sample shows, not only is case conversion locale-dependent,
but it is context-dependent too --- the conversion of a character depends on
the preceding characters.
Anthony
-- Anthony Williams Senior Software Engineer, Beran Instruments Ltd.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk