Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2004-11-03 06:31:35


> http://groups.yahoo.com/group/boost/files/anfa-regex/

Thanks, I think I can see where you're coming from now, but it's going to
take me a while to digist this.

> EN> ... xpressive's
> EN> compile-time design allows for multiple back-ends (NFA, backtracking
> EN> NFA, DFA) without paying for ones that aren't used. However, xpressive
> EN> is hard-coded to use a backtracking NFA for now, and there is no clean
> EN> interface for plugging in a different back-end...
>
> Maybe it's good time to officially separate regex components from
> each other and define the way they should interact, thus allowing
> users to connect them at their preference?
>
> I can name three of them: syntax parser, state machine, and character
> sets. The first two are interacting through either tree data
> structures (passive/declarative way), or callbacks imitating the
> parser shift-reduce activity (active/imperative way; the better one,
> I think). The other two interact through "bool contains( char_t c )"
> (and, possibly, additional functions to deal with more subtle stuff).
> I understand that some algorithms could prefer to treat those three
> as if they were closely related... well, whether it will become a
> problem is a question of design. Opinions on this, as well as on the
> whole subject, are welcome.

The character set functionality is already separated out into the regex
traits classes: the new style traits class design in the regex5 code is a
lot better defined and simplified compared to the old design BTW.

The problem with separating the parser out, is that it can often benefit
from linkage to the state machine format: it needs to know what "kinds" of
states are supported: whether the state machine uses a "complex" or
"reduced" instruction set (for example whether to expand +, ?, and {a,b}
into just | and * operations: as most DFA implementations require. I
haven't looked to see if yours does though).

I'm not saying it's impossible, just difficult; it's the details that really
get you ;-)

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk