Boost logo

Boost :

Subject: Re: [boost] template-defined regexp proposal
From: Vit Stepanek (vit.stepanek_at_[hidden])
Date: 2011-01-31 10:57:46


On Mon, 2011-01-31 at 09:27 -0500, Edward Diener wrote:
> On 1/30/2011 6:30 PM, Vit Stepanek wrote:
> > Hi,
> >
> > I've implemented the basic regexp functionality using few simple
> > template classes. Any regexp can be created by inserting the template
> > classes one into another in the required order.
> > Although it's in the design state, I'd like to find out if there's any
> > interest in providing this functionality.
>
> snipped...
>
> I like your idea but you need to provide more extensive documentation of
> what you are doing.
>

OK, I didn't want to overload you from the start...

My basic idea was to avoid runtime interpreting - thus using template
classes, and to avoid depending on any regexp syntax and rules.
Therefore I made a class for each of the most often used regexp actions,
instead of creating overloaded operators or somehow simulating the
regexp syntax. The result is in compile time built function with simple
structure and use... But that's just my insight.

To be more detailed, let's look at some implementation. I confess some
things can be improved, but that can be done anytime.

Every template class takes some template arguments (or none), depending
on the action type. The action is executed through the () operator.
Classes that contain other sub-matches, call the () operators of the
underlying classes.
(Currently it's implemented to work with c-strings, but I intend to make
it possible to work on any iterator)

For example - the string matching class looks like this:

* template< typename T_STRHOLDER >
* struct StrMatch
* {
* template< typename T_CHAR >
* bool operator ( ) (const T_CHAR& str )
* {
* const T_CHAR s = str;
* const T_CHAR p = m_str( );
* // comparing here
* (...)
* return true/false, depending on the cmp. result
* }
*
* T_STRHOLDER m_str;
* };

Any function object or function can be passed as a parameter, and is
called during the execution to obtain a value to compare.

Similar is "IsIn" matching class, which compares one item against the
set of given available values.

The Or-match is nothing more than this:

* template< typename T_MATCH1, typename T_MATCH2 >
* struct OrMatch
* {
* template< typename T_CHAR >
* bool operator( ) (const T_CHAR& str )
* {
* return m_match1( str ) || m_match2( str );
* }
*
* T_MATCH1 m_match1;
* T_MATCH2 m_match2;
* };

Looks quite simple, but with few other classes the comparing is clear
and self descriptive.

Basically there are 2 kinds of classes, let's call them
- control classes (OrMatch, Quantity - those which control the way the
underlying comparison is done) and
- matching classes (perform the comparing).

Some more enhanced classes like LazyMatch allow to build more difficult
comparing structures.

The execution is invoked using operator () on the regexp object:
bool res = re( str );

Any questions / ideas?

To Mathias:

> You are also building an engine through composition of template
objects.

Well, yes. Just let me explain what I have, the differences may show (or
not - we'll see). And, my little tool is far not yet finished.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk