|
Boost : |
From: Eric Niebler (eric_at_[hidden])
Date: 2006-01-04 12:38:18
John Maddock wrote:
>
>>4) Punt the decision to the traits type :-) For xpressive, I added a
>>in_range_nocase(Char a, Char b) member to the traits concept. By
>>default the traits provided by xpressive do *not* do proper case
>>folding. They just use toupper and tolower, and are documented as
>>such. An ambitious person can write their own trait to do proper
>>Unicode case folding and get the right behavior.
>
>
> Right, but the question is: is it actually *possible* to do proper Unicode
> case folding with this interface?
Trivially, yes. (The actual interface is in_range_nocase(Char from, Char
to, Char ch) -- there's a typo above.) The algorithm is:
1) Build a table such that for every Unicode character, you can get a
list of its case-folded equivalents. I wrote a script that does this,
using http://www.unicode.org/Public/UNIDATA/CaseFolding.txt as input.
2) In in_range_nocase(), look up the list of case-folded equivalent
chars for ch, the char to test.
3) For each char in the list, see if it's in the range specified.
-- Eric Niebler Boost Consulting www.boost-consulting.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk