Boost logo

Boost :

From: Christopher Kohlhoff (chris_at_[hidden])
Date: 2005-09-15 19:58:19


Hi Eric,

--- Eric Niebler <eric_at_[hidden]> wrote:
> Understood. Perhaps what is called for is a "pull" iterator that
> buffers
> chunks of data at a time, and when the buffer underflows, it fetches
> another chunk of data. That way, libraries like xpressive and Spirit
> can
> keep their iterator-based interface and not worry whether or not
> "++begin" goes to disk for the next 4Kb, or reads from a socket, or
> whatever.
>
> Would that address your problem?

I'm not sure, however your comments further down suggest probably not.
If the read from socket is to use async I/O then the regex code has to
give up control of the thread.

> That's not the case for a backtracking regex engine like Boost.Regex
> or xpressive. These libraries require bidirectional iterators because
> they may need to back out state transitions and decrement the
iterator
> to try a different alternative. You'll need to buffer everything read

> so far, or else write it to a tmp file so you can get it back should
> you need it.

Ok, I didn't realise it required backtracking. Perhaps xpressive can be
wrapped with something that does the buffering from the correct
position in the input stream automatically in this case, but...

> And the problem of returning a partial match and persisting the
> current state of the state machine is a hard one. Some
implementations
> maintain their state on the program stack, so returning effectively
> wipes out all that information.

Does this mean that xpressive stores its state on the program stack?

> These implementations would need to somehow serialize the state
stored
> on the program stack, and then de-serialize it in order to begin
> executing where it left off. Tricky stuff.

This does confirm my feeling that there is call for a async I/O
friendly "regular expression" library, and it:

- Only supports expressions that can be mapped to FSMs without
requiring backtracking.

- Does not store any state on the program stack.

I don't believe it needs to be anything like as rich in functionality
as xpressive, say, and so I'm quite happy to drop support for the
"hard" stuff in order to make it async I/O friendly. If that level of
functionality is required a user can always do the processing in two
steps, where the second step passes a complete message through
something like xpressive.

Cheers,
Chris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk