Boost logo

Boost :

From: Matt Austern (austern_at_[hidden])
Date: 2004-11-03 14:35:01


On Tue, 2 Nov 2004 10:11:45 +0300, Vladimir Pozdyayev <ardatur_at_[hidden]> wrote:
> To Eric Niebler and John Maddock on the subject of "something
> completely different":
>
> Oh well. Mea culpa. Thanks.
>
> And now for something more completely different.
>
> JM> I forgot to say: without understanding the algorithm you are using, and the
> JM> data it needs, it's hard to know whether or not Boost.Regex can support it,
> JM> I think you may have been asked this already, but do you have a description
> JM> anywhere of the algorithm you are using?
>
> http://groups.yahoo.com/group/boost/files/anfa-regex/
>
> EN> ... xpressive's
> EN> compile-time design allows for multiple back-ends (NFA, backtracking
> EN> NFA, DFA) without paying for ones that aren't used. However, xpressive
> EN> is hard-coded to use a backtracking NFA for now, and there is no clean
> EN> interface for plugging in a different back-end...
>
> Maybe it's good time to officially separate regex components from
> each other and define the way they should interact, thus allowing
> users to connect them at their preference?
>
> I can name three of them: syntax parser, state machine, and character
> sets. The first two are interacting through either tree data
> structures (passive/declarative way), or callbacks imitating the
> parser shift-reduce activity (active/imperative way; the better one,
> I think). The other two interact through "bool contains( char_t c )"
> (and, possibly, additional functions to deal with more subtle stuff).
> I understand that some algorithms could prefer to treat those three
> as if they were closely related... well, whether it will become a
> problem is a question of design. Opinions on this, as well as on the
> whole subject, are welcome.

For what it's worth, I think that's a really promising idea. As is, the
library is a black box. Nobody can introduce a new syntax parser
without access to the library internals; this is directly contrary to
the way that the STL separates containers and algorithms.

The question is whether it's possible to define an interface, or a
family of interface, for creating and executing NDFAs. (Or maybe
even DFAs in some simple cases?) Certainly the fact that better
algorithms are possible for regexes that don't have backreferences
than for regexes that do is interesting; to me, at least, this suggests
that we might want to provide both algorithms and let users choose
whichever one is appropriate.

      --Matt


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk