Boost logo

Boost :

From: Vladimir Pozdyayev (ardatur_at_[hidden])
Date: 2004-11-04 02:17:53


John Maddock wrote:

JM> The problem with separating the parser out, is that it can often benefit
JM> from linkage to the state machine format: it needs to know what "kinds" of
JM> states are supported: whether the state machine uses a "complex" or
JM> "reduced" instruction set

This is why I suggested that the better way (I do not claim it's the best
way, though) of connecting parser to state machine is by having
regex_creator to perform the shift-reduce activity. These actions must
be present somewhere in some form in any case, and moving them to
regex_creator allows us to get by without creating additional
structures (read: trees) if we don't need them explicitly.

The following sample creates a tree as the parser calls creator's
feature functions. Can your regex compilation process be represented
this way?

class abstract_regex_creator {
public:
    void charset( const charset & ) { throw exception_unsupported; }
    void repeat( int, int, bool ) { throw exception_unsupported; }
    void cat() { throw exception_unsupported; }
    // ...
};

class sample_regex_creator: public abstract_regex_creator {
protected:
    // declare node_stack here
public:
    void charset( const charset &cs ) {
        // push the charset node onto the stack
    }
    void repeat( int lo, int hi, bool lazy ) {
        // pop the top node N, repeat it (expanding into whatever
        // subexpression is necessary), and push the resulting node back
    }
    void cat() {
        // pop two nodes, push the cat node back
    }
    // and so on; if any regex feature is not supported,
    // just don't mention it here, so it will throw an exception
};

JM> (for example whether to expand +, ?, and {a,b}
JM> into just | and * operations: as most DFA implementations require. I
JM> haven't looked to see if yours does though).

Types of regex elements that are directly supported are defined in
"anfa-nodes.h": null/charset leafs; +, +?, ?, ??, () unary
operations; concatenation and alternation.

* (*?) is expanded into + and ? (+? and ??).

{a,b} is not in the source yet; it has to be expanded into a sequence
of ?'s.

-- 
Best regards,
Vladimir Pozdyayev.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk