Boost Users :

Meyer-Eltz_at_[hidden]

--
am Montag, 18. Dezember 2006 um 18:31 schrieben Sie:
> Detlef Meyer-Eltz wrote:
>> I have a difficulty to predict, which part of a regular expression
>> will match.
>>
>> Example:
>> I have a regular expression for a general HTML tag: <[^>]*>
>> combined with an expression for the body tag: <body([^>]*)>
>>
>> to: (<[^>]*>)|(<body([^>]*)>)
>>
>> This expression matches the text: <body bgcolor="white">
>>
>> As both alternatives can match the input with the same length, I
>> expected, that the repeated fouth part of the "Leftmost Longest" Rule
>> would determine, which alternatve is chosen:
>>
>> 4. Find the match which has matched the first sub-expression in the
>>   leftmost position, along with any ties.  If there is only on(e)
>>   such match possible then return it.
>>
>> // note the missing 'e'
>>
>> As the tag-expression has no sub-expression at all, the
>> body-expression should win. Its sub-expression could match, but
>> doesn't. It seems to me, that the sequence of the alternatives
>> determines the match.
>>
>> Now I guess, that I misinterpreted 4.: its not a means to predict the
>> matching alternative but only to find the one that matched
>> accidentally? My software constructs lexers from elementary
>> expressions automatically. So it's important for me to direct and
>> predict the expected matching alternative. Are there any other rules?
>> Does the sequence of the alternatives determine the match
>> unmistakably?
> Which Boost.Regex version are you using, and how are you compiling the 
> expression?
> Recent versions default to the Perl matching rules: *which do not use the 
> leftmost longest rule*.  They match based on a "first match found" rule, so 
> if the first alternative leads to a match then subsequent alternatives are 
> never examined.
> If you really want leftmost-longest semantics, then compile the expression 
> as a POSIX extended regex, but of course then you loose the ability to use 
> Perl-like regex extensions.
> HTH, John.
> PS, your analysis of the leftmost-longest rule looks correct however.
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users