|
Boost : |
From: John Maddock (john_at_[hidden])
Date: 2006-01-04 05:28:22
>> 1) Do what ICU does and enumerate every character in the range [x-y]
>> and convert it to it's case-folded equivalent. The trouble is it's
>> pathologically slow for very large ranges. Of course if you use a
>> very large range you get exactly what you deserve!
>
> Please, no.
:-)
> 4) Punt the decision to the traits type :-) For xpressive, I added a
> in_range_nocase(Char a, Char b) member to the traits concept. By
> default the traits provided by xpressive do *not* do proper case
> folding. They just use toupper and tolower, and are documented as
> such. An ambitious person can write their own trait to do proper
> Unicode case folding and get the right behavior.
Right, but the question is: is it actually *possible* to do proper Unicode
case folding with this interface?
> Inconsistent. Yes. Weird? No. The simple rule is: when you're about to
> start repeating a group, that capture and all captures within are
> first set to undefined. (ECMA-262 15.10.2.5.) Now that I look more
> closely, I see that ECMA is stricter about setting captures to
> undefined in these situations than Perl is, and xpressive is
> non-compliant in this area, too. <sigh>
Oh shucks I see it now: incidently the behaviour described is consistent,
doing what they do will work for all alternatives and repeats just as well
(since an alternative must be within a repeat if it's captures are going to
need clearing).
It's going to be a pain to implement though :-/
Thanks for the pointer,
John.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk