Boost logo

Boost :

From: Scott Woods (scottw_at_[hidden])
Date: 2005-09-14 23:40:40


----- Original Message -----
From: "Eric Niebler" <eric_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Thursday, September 15, 2005 3:05 PM
Subject: Re: [boost] [Review] xpressive

> Answers inline...

Thanks.

> > 1. What is the benefit of providing the complete match in the
> > first entry of the results? e.g. "what[0]". While this is consistent
> > with a long tradition in RE, after some time with STL it's
> > presence at position zero wasnt as comfortable as I expected.
>
>
> I'm curious, what did your experience with STL lead you to expect?
>
> I did it this way because TR1 regex does it that way. Although xpressive
> is not a fully compliant TR1 regex implementation, minimizing gratuitous
> differences can only help.

Yep, agreed.

Going back to the "what[0] / STL" thing and starting with your (snipped)
example;

    std::string hello( "hello world!" );

    sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
    smatch what;

    if( regex_match( hello, what, rex ) )
    {
        std::cout << what[0] << '\n'; // whole match
        std::cout << what[1] << '\n'; // first capture
        std::cout << what[2] << '\n'; // second capture
  }

"What[ 0 ]" is the odd one out; it does not have an implicit mapping to a
manifest
sub-expression. To RE-philes (I think my first exposure to $0 was in "vi"?)
it's de
rigueur. To those C++ developers that were born more recently but are
familiar
with STL, it's a wrinkle. Does processing of "what" always involve
"++what.begin()" only
because "what.complete()" fails to compete with tradition. Please don't take
my quoted
code snippets literally. Or imagine I side with the next generation :-)

> >
> > 2. Why the slash syntax in dynamic regex? The resulting
> > requirement for a double is fairly ugly. It may be consistent with
> > something (Perl/ECMA/..?) but on balance is it worth it?
> > > I'm following the lead of every other regex package for C and C++ out
> there. Anything else, and there would be riots in the streets. I agree
> that the double-slashes are hard on the eyes, though. (So use static >
regexes insted. :-)

Ha, cool.

> >
> > 3. Why ">>" and not "," (comma). Did the "set" facillity
> > take priority or does the low precedence of comma just result in
> > a different ugliness (sorry, not really the word I want to use :-).
> >
> As Joel already said, operator precedence. Also, I completely stole
> Spirit's choice of operators, lock, stock and barrel. That's a conscious
> decision (made after much debate and hand-wringing) to ease any future
> unification, and so that Spirit users can be productive with xpressive
> with a minimum of fuss.

Yep, sorry to have missed the evolution of Spirit. I'm a fairly recent
Booster that only bothered to search my archives for xpressive before
writing the review. Seems kinda dumb now; hope to do better next time.

> >
> > 5. There didnt appear to be much specific thought given to file
processing.
> > Is
> > this another "not yet implemented"? In particular elegant integration
> > with any async I/O facillity arising from sockets and file I/O
initiatives.
> >
> xpressive works generically with iterators. Spirit has a file iterator.
> That would be the way to go, IMO.

For "normal" file processing this is fine. Well actually its marvelous.
But for another circumstance see below.

> >
> > 6. Very interested in the future of "semantic actions". Actions and file
> > processing probably go together?
> >
> They're orthogonal, AFAICT.
> >

Yes they are. But I need to be clearer. I was associating files of input
with semantic actions because processing of a file with xpr has a good
chance of involving a complex xpr. And getting the right code to run at
the right time with such an xpr, without embedded actions, involves
contortions (even unnecessary CPU cycles?). I'm sure you are fully aware
of all this. Sorry, it was an idle association.

Also, a recurring problem with related tools such as lex, flex, yacc and
bison is
that they are architected to be "superior" to the "sub-ordinate"
input/buffering
scheme. On one hand, this is great because in a traditional parser it hid a
significant sub-system and often did an efficient job of it. OTOH it is
often
difficult/impossible to present data blocks to such a parser in an async
fashion.
A role reversal is required. Borrowing your example again;

    // Sometime before establishing a TCP connection

    sregex rex = sregex::compile( "(\\w+) (\\w+)\\n" ); // Two words per
line
    smatch what;

    // On an FD_READ
    // Load available bytes into char buffer[] and;

    while( regex_accumulate( buffer, what, rex ) )
    {
            // The pattern has been matched.
            // This loop body may be entered 0
            // or more times, for each FD_READ

        string command = what[ 1 ];
        string argument = what[ 2 ];
    }

Structured this way, the application processing the commands is
completely impervious to changing MTUs and block sizes.

But something needs to carry the xpressive state between invocations
of "regex_accumulate"? Hell, would the xpr lib work as is!?

Cheers.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk