|
Boost : |
From: Vladimir Pozdyayev (ardatur_at_[hidden])
Date: 2004-11-04 02:17:53
John Maddock wrote:
JM> The problem with separating the parser out, is that it can often benefit
JM> from linkage to the state machine format: it needs to know what "kinds" of
JM> states are supported: whether the state machine uses a "complex" or
JM> "reduced" instruction set
This is why I suggested that the better way (I do not claim it's the best
way, though) of connecting parser to state machine is by having
regex_creator to perform the shift-reduce activity. These actions must
be present somewhere in some form in any case, and moving them to
regex_creator allows us to get by without creating additional
structures (read: trees) if we don't need them explicitly.
The following sample creates a tree as the parser calls creator's
feature functions. Can your regex compilation process be represented
this way?
class abstract_regex_creator {
public:
void charset( const charset & ) { throw exception_unsupported; }
void repeat( int, int, bool ) { throw exception_unsupported; }
void cat() { throw exception_unsupported; }
// ...
};
class sample_regex_creator: public abstract_regex_creator {
protected:
// declare node_stack here
public:
void charset( const charset &cs ) {
// push the charset node onto the stack
}
void repeat( int lo, int hi, bool lazy ) {
// pop the top node N, repeat it (expanding into whatever
// subexpression is necessary), and push the resulting node back
}
void cat() {
// pop two nodes, push the cat node back
}
// and so on; if any regex feature is not supported,
// just don't mention it here, so it will throw an exception
};
JM> (for example whether to expand +, ?, and {a,b}
JM> into just | and * operations: as most DFA implementations require. I
JM> haven't looked to see if yours does though).
Types of regex elements that are directly supported are defined in
"anfa-nodes.h": null/charset leafs; +, +?, ?, ??, () unary
operations; concatenation and alternation.
* (*?) is expanded into + and ? (+? and ??).
{a,b} is not in the source yet; it has to be expanded into a sequence
of ?'s.
-- Best regards, Vladimir Pozdyayev.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk