Boost logo

Boost :

From: Greg Colvin (gcolvin_at_[hidden])
Date: 1999-09-05 12:22:34


From: Dave Abrahams <abrahams_at_[hidden]>
> >>Came across an excellent reg expr package at
> > http://ourworld.compuserve.com/homepages/John_Maddock/regexpp.htm
> >
> > Can be used at many levels - from raw API to posix compatible interface to
> > a
> > more C++ approach using locales etc.<
> >
> > I hope you don't mind me jumping in here, but I've been lurking here for a
> > few days after a "heads up" from Paul.
> >
> > The boost project is certainly an interesting idea, and I've no doubt that
> > group partisipation results in stronger code, so I've been wondering about
> > offering regex++ for integration with boost. I can forsee many problems
> > (differing namespaces and header file extensions amoung the easier to
> > solve) and there is currently some duplication of effort (file/directory
> > iterators, timer classes), so the question is what do you guys think? Is
> > it even remotely workable/desirable, or too much work?
> >
> > I look forward to your comments :-)
>
> I like the idea in principle, but:
>
> I directed a friend of mine who is a regex guru at your package. He said it
> was interesting, but, in his words "it uses POSIX syntax but unfortunately
> doesn't have POSIX semantics. He says he saw the words 'leftmost longest' in
> the POSIX documentation but didn't know what they meant so he implemented
> what Perl does". Now maybe my friend was misinterpreting what you wrote, but
> what he says POSIX does is to build a DFA for the expression which
> recognizes ALL alternatives simultaneously. The Perl style of interpreting
> regular expressions leads to backtracking and all sorts of other
> inefficiencies, including failure to recognize strings which match certain
> expressions or combinatoric explosion. I am strongly in favor of a DFA
> implementation.

Both NFA and DFA approaches have advantages. As I recall, and it's been
a while, a DFA uses time linear in the amount of text to search, but
potentially exponential space to store the DFA, whereas the NFA uses
space linear in the size of the search pattern, but potentially exponential
time. I've used NFAs for small query parsers, and DFAs for searching
across big collections of text.

I very much welcome John's participation -- what is most important is a
clean syntax -- the semantics can be fixed to be whatever we need them
to be.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk