Boost logo

Boost :

Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Max (more4less_at_[hidden])
Date: 2011-02-10 08:46:03


> From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]]
> On Behalf Of Yechezkel Mett
> Sent: Thursday, February 10, 2011 5:41 PM
> To: boost_at_[hidden]
> Subject: Re: [boost] [Tokenizer]Usage and documentation
>
> ^|[\s,]
>
> means _either_ the beginning of the line _or_ a space or comma. In
> other words the field starts either at the beginning of the line or
> after a space or comma.
>
> Likewise
>
> $|[\s,]
>
> The field ends either at the end of the line or before a space or comma.

I indeed never realized that ^ and $ could be used in combination with | in
that way before.
I didn't use RE that frequently though.

>
> > One more question - with you code, any empty 'token' between two
> contiguous
> > ',' is ignored, what if someday I'd like to pick them up?
>
> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
>
> I'm presuming an empty line should count as no tokens; if you don't
> mind an empty line being one token it can be simplified to

I have 3 version of the RE's sitting side by side attempting to figure out
the difference
between them.

> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ // (1)
> "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) //
(2)
> "([^"]*)"|([^\s,"]+) //
(3) original version offered by Stephen

But, unfortunately, I still cannot fully grasp the meaning of (1) and (2).
But by testing (1) with Stephen's code, I get:

r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$

empty,,,fields, , , like this
[empty][][fields][][like][this]
,,,
[][]

There are 2 empty tokens in between each 3 contiguous ',' but only one for
each is detected.

Likewise, for (2), I get:

r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)

empty,,,fields, , , like this
[empty][fields][like][this]

This time, the behavior is no different than the 'original' version.

Thank you Yechezkel for you help.

BTW, it seems like by reading
http://www.boost.org/doc/libs/1_45_0/libs/regex/doc/html/boost_regex/syntax/
perl_syntax.html
I cannot get a full view of the regex grammar. Maybe I need a whole book on
it? :-)

Is there any *complete* introduction available on the net?

B/Rgds
Max


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk