|
Boost : |
Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Max (more4less_at_[hidden])
Date: 2011-02-16 08:01:19
[Yechezkel Mett]
>
> ,\s*(),
>
> means find a ',' followed by any number of spaces followed by a ','
> and capture an empty string.
Yes, now I see. Thank you, Yechezkel.
>
> The others are similar.
>
> >
> > r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
> >
> > empty,,,fields, , , like this
> > [empty][][fields][][like][this]
> > ,,,
> > [][]
> >
> > There are 2 empty tokens in between each 3 contiguous ',' but only one
for
> > each is detected.
>
> Yes, that's a mistake. When matching ,, as an empty field the second
> ',' is eaten and can no longer be used as the beginning of the next
> field.
>
> "([^"]*)"|([^\s,"]+)|,\s*()(?=,)|^\s*()(?=,)|,\s*()$
>
> should work. (?=) is a lookahead, it checks that the pattern (',' in
> this case) matches at this point, but doesn't eat any input.
>
Yes, Its behavior is exactly as you expected.
> >
> > Likewise, for (2), I get:
> >
> > r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)
> >
> > empty,,,fields, , , like this
> > [empty][fields][like][this]
> >
> > This time, the behavior is no different than the 'original' version.
>
> I get the same results as the first version. Perhaps it wasn't escaped
properly?
Yes, you are right. My different result came from my incorrect escaping
unintentionally.
B/Rgds
Max
P.S
I've found some 'complete' reference (books) on RE. However it's this thread
of discussion that has indeed triggered a leap of my understanding of RE.
And, I have also had a revisit, not so deep though, to SPIRIT.Qi, following
the direction of Michael. (Qi is a power tool I believe I definitely will
use,
and its siblings.)
Now I'm able to comprehend quite 'complex' expression, including whose
appeared in this thread.
Thank you Michael, Yechezkel, Stephan for your kind help!
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk