Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2005-06-24 12:21:28


John Maddock wrote:
>>No. We're only talking about case folding -- specifically the mappings
>>found in http://www.unicode.org/Public/UNIDATA/CaseFolding.txt.
>
>
> Well maybe you are, but the regex traits clases were always intended to
> allow for other forms of equivalence as well.
>

OK

>
>>Here's my suggestion. We add to the traits class two functions:
>>
>>bool in_range(Char from, Char to, Char ch);
>
>
> What use is this one, or are you allowing equivalents other than case
> folding now <wink>? If so then I approve :-)

I dunno. I threw it in for completeness, but I don't think any
implementation besides:

   return from <= ch && ch <= to;

makes sense. You don't want to do any character translations or fancy
equivalence stuff here. Consider what happens if translate(from) >
translate(to).

>
>
>>bool in_range_nocase(Char from, Char to, Char ch);
>
>
> OK, but see below.
>
>
>>We define the behavior of the regex engine in terms of these functions,
>>but we don't require their use. In particular, for narrow character
>>sets, implementers would be free to use a std::bitset<256>, enumerate
>>the char range [from, to], call translate_nocase on each char, and set
>>the appropriate bit in the bitset. Matching happens by calling
>>translate_nocase on the input char and seeing if its bit is set in the
>>bitset. That gives the same behavior.
>
>
> I don't like traits class API's that may or may not be called: what happens
> if a user defined traits class is provided that alters the behavior of
> in_range, but not translate? The side effects produced by these API's are
> clearly visible.

As I suggest above, I don't think in_range should depend on translate.
Your point is still valid, though, but the optimization is too important
to ignore. We could standardize a specialization of regex_traits<char>
(like the specialization of char_traits<char>) for which the behavior is
known. Or more generally, we could require that for all regex traits for
which 1==sizeof(char_type) then in_range_nocase is required to give the
same results as the algorithm described above.

> I agree Unicode support is clearly desirable: however on point of
> proceedure, I believe it's too late to change this for TR1, changes for
> C++0x are clearly still possible though. Whatever we need to file this as a
> DR.

Agreed. How does one file a DR? On comp.std.c++? Do you want to do the
honors, or should I?

>
> The most pressing point for level 1 support is section 1.5 Caseless
> Matching: "Supported, note that at this level, case transformations are
> 1:1, many to many case folding operations are not supported (for example "ß"
> to "SS"). "

The way I read this, a 1:1 mapping is all that is needed for Level 1
support. So we don't have to worry about "ß" to "SS" unless we are
shooting for Level 2 or 3, which IMO we should. But that's a radical
change from TR1 regex. Let's fix what we got first.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk