Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2005-06-24 07:24:01


>The most pressing point for level 1 support is section 1.5 Caseless
>Matching: "Supported, note that at this level, case transformations are
>1:1, many to many case folding operations are not supported (for example
>"ß"
>to "SS"). "

I forgot to mention: this is part of a larger digraph problem - in some
languages more than one character may collate as a single unit - in some
case Unicode may provide predefined ligatures for these, but they don't do
so for every case combination of every ligature.

Boost.Regex supports things like [[.ae.]-[.ll.]] (match anything that
collates in the range "ae" to "ll"), and currently this should work
reasonably well in case insensitive mode as well (it fails where a
many-to-one case transformation is required). Also, since there is no way
tell which digraphs (if any) are supported by the current locale,
expressions such as [a-z] will only ever match one character, and never
match say "ae", even if the current locale does regard "ae" as a single
unit. I believe this is the only sensible option, particularly as in many
cases whether the next two characters are regarded as a digraph is dependent
upon the meaning of the word (which is to say you need a dictionary to work
it out, as Martin Bonner pointer out).

Re ICU: this appears to case folding (convert everything to a case
insensitive form) for caseless comparisons, I would assume their regex
component does the same, but haven't had a chance to try it out.

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk