|
Boost : |
From: Vladimir Pozdyayev (pvv_at_[hidden])
Date: 2004-12-22 03:18:29
John Maddock wrote:
> This is an area I want to explore though, if I can get this next lot of code
> out the door, then I'll create a cvs branch to experiment with this, if you
> want to suggest / experiment with a design for the abstract creator in the
> meantime, then go for it.
Finally I got down to refactoring the ANFA code. Take a look at
http://groups.yahoo.com/group/boost/files/anfa-regex/anfa091.zip
(once again, this is not a fullscale regex library... yet)
The DESIGN file content is appended to this message.
-- Best regards, Vladimir Pozdyayev. ---------------------------------------------------------------------- -= Regex Design Issues =- The core classes. * charset Provides "bool operator( character )". Nothing much to say apart from that, but do see the Charset Issues section. * charset creator Supports arbitrary charset expressions (within the limits of a given set of possible operations). Implementations, however, are not required to provide _all_ the declared functionality; calls for unsupported features should result in appropriate exceptions. Also provides the "void create( charset & )" function which is used to initialize a charset newly created by "matcher creator". The "abstract_charset_creator" class provides stubs for all expression elements possibly to be requested by regex parsers. Implementations with limited functionality can inherit them and redefine only those functions that should actually do something useful. * matcher Provides the low-level matching functionality, say, finding the first occurrence of bla-bla-bla. On the other hand, replacing all occurrences is a high-level action, for it consists of (1) finding them and (2) creating a modified string---so it should go into the "regex" class. (On the other other hand, if it is possible to do replacement on the fly while searching, this becomes a low-level action. I don't know if it can be done in a sufficiently general way, however.) * matcher creator Like charset creator, only for matchers. * parser The syntax parser. Takes the input string in the form of begin-end iterators, and issues a sequence of charset/matcher creator calls ending with "matcher_creator::create" (or an exception). In essence, simply provides the function "void parse( matcher &m, iterator begin, iterator end )". A parser must be consistent with the properties of "creator" classes. * regex A wrapper for the "matcher" class. Provides the high-level creation & utilization routines. How they are connected. A sample from "regex.cpp": typedef basic_regex< basic_simple_parser< basic_charset_creator< wchar_t >, basic_anfa_matcher_creator< basic_charset< wchar_t > >, basic_anfa_matcher< basic_charset< wchar_t > > >, basic_anfa_matcher< basic_charset< wchar_t > > > regex; On "creators" and "create" functions. The name is somewhat misleading, since they fill target objects with compiled data rather than create them. Still, "charset compiler" sounds a bit weird... or does it? Anyway. The "creators" are subject to the following uses and requirements. They must be able to destruct themselves gracefully even if the expression they are being feeded with is only halfway done (in case someone has thrown an "unsupported" exception). The "create" function must do an implicit "pop" from the expression stack, so that the "creator" object could be reused. The "create" function can assume there's only one top-level expression tree node on the stack. -= Charset Issues =- (Should I rename them to "character sets" for consistency with the full-names style?) All the above templates have quite a freedom in intermixing different character types, let alone different character encodings. E.g., the sample program has all regex templates instantiated with wide characters, but the regex string itself is char-typed. This clearly needs to be controlled.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk