|
Boost : |
From: John Maddock (john_at_[hidden])
Date: 2004-10-27 05:31:58
> The stuff I offer is dedicated to two tasks:
> * building an ANFA (that's Augmented NFA) from an expression
> tree of a given regex;
> * running the result against a given input string.
> What such a code desperately needs, is the following:
> * syntactical front-end: a class that would parse the actual
> regex string and build its expression tree;
> * character back-end: a class that would allow checking whether
> a given character is contained in a given character set,
> respecting encodings, locales etc.
> Boost.regex employs quite a general approach to these components.
> Reusing them and connecting my code to them is what I have in
> mind.
>
> The only snag is, I'm not familiar with boost.regex internals. So,
> any help in that field would be appreciated.
The regex internals are in the process of being completely rewritten (code
is in cvs in the regex5 branch), I hope to merge this to the main trunk in
the next few weeks: mainly it's the docs that I need to bring up to date.
Regex parsing and state machine construction should now be quite
straightforward to understand (within limits for a regex engine obviously!),
so I would urge you to take a look (I can send you a zip if you don't have
cvs access).
I think the main problem is providing the same feature set as the existing
engine - my understanding is that no machine can have the complexity you
claim and still match backrefs, or even I believe wide characters (because
the character set is too large to realistically build a table based NFA).
Is that correct?
BTW, I have always thought that there was room for multiple regex engines in
Boost that would offer increasingly fewer features, but gain in worst-case
performance.
I suppose I should have tried to separate the parser from the back-end state
machine format more, so that different engines can be plugged in at will,
but there are only so many times I think I can stand to rewrite this stuff
:-/
John.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk