Boost : |
Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Max (more4less_at_[hidden])
Date: 2011-02-10 08:46:03
> From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]]
> On Behalf Of Yechezkel Mett
> Sent: Thursday, February 10, 2011 5:41 PM
> To: boost_at_[hidden]
> Subject: Re: [boost] [Tokenizer]Usage and documentation
> ^|[\s,]
> means _either_ the beginning of the line _or_ a space or comma. In
> other words the field starts either at the beginning of the line or
> after a space or comma.
> Likewise
> $|[\s,]
> The field ends either at the end of the line or before a space or comma.
I indeed never realized that ^ and $ could be used in combination with | in
that way before.
I didn't use RE that frequently though.
> > One more question - with you code, any empty 'token' between two
> contiguous
> > ',' is ignored, what if someday I'd like to pick them up?
> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
> I'm presuming an empty line should count as no tokens; if you don't
> mind an empty line being one token it can be simplified to
I have 3 version of the RE's sitting side by side attempting to figure out
the difference
between them.
> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ // (1)
> "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) //
> "([^"]*)"|([^\s,"]+) //
(3) original version offered by Stephen
But, unfortunately, I still cannot fully grasp the meaning of (1) and (2).
But by testing (1) with Stephen's code, I get:
r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
empty,,,fields, , , like this
There are 2 empty tokens in between each 3 contiguous ',' but only one for
each is detected.
Likewise, for (2), I get:
r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)
empty,,,fields, , , like this
This time, the behavior is no different than the 'original' version.
Thank you Yechezkel for you help.
BTW, it seems like by reading
I cannot get a full view of the regex grammar. Maybe I need a whole book on
it? :-)
Is there any *complete* introduction available on the net?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk