Boost logo

Boost Users :

Subject: Re: [Boost-users] [xpressive] Using it with input iterators or how far the backtracing will go back
From: Eric Niebler (eric_at_[hidden])
Date: 2010-11-07 16:32:28


On 11/6/2010 4:58 AM, Alex Dubov wrote:
> Eric Niebler <eric <at> boostpro.com> writes:
>> On 11/5/2010 10:21 AM, Dominique Devienne wrote:
>>> On Fri, Nov 5, 2010 at 3:31 AM, Alex Dubov <oakad <at> yahoo.com> wrote:
>>>> So the question is, how can one establish the size for this iterator adapter's
>>>> internal storage so as to avoid match failures due to incomplete storage and
>>>> avoid storing too much unnecessary data? I suppose, the storage can be reset
>>>> when the match is reported. This, however, still leaves an open problem for
>>>> situations where matches are separated by, possibly, gigabytes of non-
>>>> data.
>>>
>>> In the case of single-line matching, backtracking need only going back
>>> to the start of the current line, so I think the buffering only needs
>>> to be line based. For arbitrarily long multi-line matching, I don't
>>> know... --DD
>>
>> In that case, the iterator may be backed up all the way to the beginning
>> of the match, so if the iterator is caching the input, it would need to
>> cache it all.
>>
>
> That's exactly the approach I would like to follow. Unfortunately, xpressive
> currently has no way to communicate the necessary information out.
>
> What could be nice to have is a notification of some sort for three events:
> 1. First possibly matching character is encountered (at which point iterator
> adapter can start caching).

I thought about this and don't think it's practical. Xpressive's state
machine is built from matcher components. What you're asking for is a
way to signal that you are *exactly* one character into a match. That
would require changing *all* matcher components so that (a) they know if
they are the first non-zero-width matcher in the state machine, and (b)
they signal when they successfully match the first character. It could
be done, but I'm not going to maintain that. Sorry.

> 2. All possible matches abandoned (iterator cache can be truncated).
> 3. Match is found (well, it's already there:-).
>
> The not-so-exciting alternative to having this functionality in xpressive is
> going back to C-style matching engines for stream filtering.

Yep. :-( But like I said, if your stream is like a file stream, then you
can manipulate the underlying buffers to give you a sliding window,
paging data in and out as needed.

Good luck.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net