Boost logo

Boost Users :

From: Eric Niebler (eric_at_[hidden])
Date: 2006-04-15 21:24:58


Sebastian Redl wrote:
> Eric Niebler wrote:
>
>> Since I wrote the above, I have fixed the performance problem with BMH
>> and case-insensitive matches by extended the regex traits class with a
>> function that returns all the case-folded equivalents of a character.
>> This resulted in a significant performance improvement for
>> case-insensitive matches.
>>
>>
> How does that work with multiple character case mappings, like the
> German ß -> SS (the sharp s does not exist in upper case)?

It doesn't. :-P Xpressive aims for "Basic Unicode Support," as defined
by Unicode TR18 (http://www.unicode.org/reports/tr18/):

     Some caseless matches may match one character against two:
     for example, U+00DF "ß" matches the two characters "SS".
     And case matching may vary by locale. However, because many
     implementations are not set up to handle this, at Level 1
     only simple case matches are necessary.

So correct handling of German ß -> SS is only necessary for "Extended
Unicode Support," which would be nice but is a more distant goal. Sadly
and AFAICT, TR1.Regex doesn't even make accommodation for Basic Unicode
Support, since it doesn't provide syntax for character set subtraction
and intersection.

In short, it's a problem, but there are bigger fish to fry. If you need
a regex engine that can handle this today, try ICU.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net