Boost logo

Boost Users :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-08-13 13:07:28


John Maddock wrote:
> Raffaele Romito wrote:
>> Thanks for the reply John, I'll try to expose my "use case" with the
>> hope this feature will be available in the future...
>> I'm developing a system that detects TCP streams with a pattern
>> matching, here is the pseudo code (in a simplified form) of the
>> function I'd link to extend:
>>
>> protocol_stream decodeStream(tcp_stream)
>> {
>> foreach(registered protocol in the system)
>> {
>> required_size_to_match =
>> calc_min_regex_len(current_protocol.regex);
>>
>> if(tcp_stream.buffer_size <= required_size_to_match)
>> check if tcp_stream.buffer matches
>> else
>> try to read from tcp_stream if data is available until
>> <required_size_to_match> and check again if matches
>> }
>> }
>>
>> Hope u can help since you have "an almost pathological interest in
>> anything that can't be done" :)
>
> Oh dear, I should have realised that this would come back to haunt me ! :-0
>
> I've now realised that in the general case this can't in fact be implemented
> (think back-references), but can be for at least a subset of regexes. If
> you're still keen on the feature can you please file a feature request on
> the TRAC (http://svn.boost.org/trac) so this doesn't get lost?

But it can, and GRETA does this as an optimization. (It won't search for
a match when it knows there isn't room for one.) For example:

   (foo|barbaz)\1

The minimum match length is 6. Things get tricky when a backreference
refers to an enclosing group, as in (foo\1) (and yes, you can do that,
but you really shouldn't), in which case, the conservative answer is to
say the minimum match length of \1 is 0 and then proceed with the rest
of the calculation.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net