Boost logo

Boost :

Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Max (more4less_at_[hidden])
Date: 2011-02-10 08:46:03

> From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]]
> On Behalf Of Yechezkel Mett
> Sent: Thursday, February 10, 2011 5:41 PM
> To: boost_at_[hidden]
> Subject: Re: [boost] [Tokenizer]Usage and documentation
> ^|[\s,]
> means _either_ the beginning of the line _or_ a space or comma. In
> other words the field starts either at the beginning of the line or
> after a space or comma.
> Likewise
> $|[\s,]
> The field ends either at the end of the line or before a space or comma.

I indeed never realized that ^ and $ could be used in combination with | in
that way before.
I didn't use RE that frequently though.

> > One more question - with you code, any empty 'token' between two
> contiguous
> > ',' is ignored, what if someday I'd like to pick them up?
> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
> I'm presuming an empty line should count as no tokens; if you don't
> mind an empty line being one token it can be simplified to

I have 3 version of the RE's sitting side by side attempting to figure out
the difference
between them.

> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ // (1)
> "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) //
> "([^"]*)"|([^\s,"]+) //
(3) original version offered by Stephen

But, unfortunately, I still cannot fully grasp the meaning of (1) and (2).
But by testing (1) with Stephen's code, I get:

r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$

empty,,,fields, , , like this

There are 2 empty tokens in between each 3 contiguous ',' but only one for
each is detected.

Likewise, for (2), I get:

r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)

empty,,,fields, , , like this

This time, the behavior is no different than the 'original' version.

Thank you Yechezkel for you help.

BTW, it seems like by reading
I cannot get a full view of the regex grammar. Maybe I need a whole book on
it? :-)

Is there any *complete* introduction available on the net?


Boost list run by bdawes at, gregod at, cpdaniel at, john at