Boost logo

Boost Users :

Subject: Re: [Boost-users] boost::xpressive - Regex stack space exhausted
From: Eric Niebler (eric_at_[hidden])
Date: 2010-07-22 10:10:32


On 7/22/2010 10:01 AM, Pavol Supa wrote:
> On Thu, Jul 22, 2010 at 2:33 PM, Eric Niebler <eric_at_[hidden]> wrote:
>> On 7/22/2010 5:29 AM, Pavol Supa wrote:
>>> On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler <eric_at_[hidden]> wrote:
>>>> On 7/21/2010 2:39 PM, Pavol Supa wrote:
>>>>> Hi,
>>>>>
>>>>> I need to match a hex-written byte array, optionally separated with
>>>>> spaces. So i tried:
>>>>>
>>>>> boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit));
>>
>> Wait a minute. What does the data you're trying to match look like? Do
>> you know that xdigit only matches a single hex character? Your data
>> should look something like:
>>
>> 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 00 11 ....
>>
>> Is that right?
>
> i used data just without spaces, 000000000000...00000
>
> the data are most often without spaces, so i decided meanwhile to use
> the workaround (it doesn't throw):
> expr = * repeat<2,2> (xdigit)

Then why not just "expr = *xdigit" in that case?

> but if i move the inner expression to a function, it starts throwing
> (at similar length), i.e.:
> sregex byte() { return repeat<2,2> (xdigit) }
> sregex expr = * byte();

Yes, these are different beasts altogether. A nested regex match creates
a nested match_results object, with nested submatches. By assigning to a
regex object, you have erased the type information that xpressive uses
to optimize matching. In this case, xpressive no longer knows that it's
quantifying something of fixed-width so that backtracking is unnecessary.

>>>>> smatch match;
>>>>> regex_match (input, match, r);
>>>>>
>>>>> when i use input of approx. 150 hex pairs, i get an exception "Regex
>>>>> stack space exhausted" (i use default stack size by Visual studio
>>>>> 2008)
>>>>>
>>>>> This pattern looks quite simple, so I'd like to know, if there is some
>>>>> fundamental problem with this expressions.
>>>>
>>>> Yes. See
>>>> http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.beware_nested_quantifiers
>>>>
>>>> Not only will this pattern tear through stack, it'll run very slowly.
>>>> Try this instead:
>>>>
>>>> * ( keep(*blank) >> repeat<2,2> (xdigit))
>>>
>>> So, i tried it. It throws exceptions at ~230 hexdigit pairs.
>>
>> That doesn't sound right. This could be a bug. I'll look at this today.
>
> i tried to debug the parsing, but it looked ok - when there is a
> non-trivial expression inside (as mentioned above), it goes into
> recursion, about 12 functions on callstack, and it eats whole
> (default) stack at those mentioned quantities

See above. I hope it makes sense now. But still,
*(keep(*blank)>>repeat<2,2>(xdigit)) should work just fine. I'll look.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net