|
Boost : |
Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Yechezkel Mett (ymett.on.boost_at_[hidden])
Date: 2011-02-13 06:44:17
On Thu, Feb 10, 2011 at 3:46 PM, Max <more4less_at_[hidden]> wrote:
> I have 3 version of the RE's sitting side by side attempting to figure out
> the difference
> between them.
>
>> "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ // (1)
>> "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) //
> (2)
>> "([^"]*)"|([^\s,"]+) //
> (3) original version offered by Stephen
>
> But, unfortunately, I still cannot fully grasp the meaning of (1) and (2).
,\s*(),
means find a ',' followed by any number of spaces followed by a ','
and capture an empty string.
The others are similar.
>
> r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
>
> empty,,,fields, , , like this
> [empty][][fields][][like][this]
> ,,,
> [][]
>
> There are 2 empty tokens in between each 3 contiguous ',' but only one for
> each is detected.
Yes, that's a mistake. When matching ,, as an empty field the second
',' is eaten and can no longer be used as the beginning of the next
field.
"([^"]*)"|([^\s,"]+)|,\s*()(?=,)|^\s*()(?=,)|,\s*()$
should work. (?=) is a lookahead, it checks that the pattern (',' in
this case) matches at this point, but doesn't eat any input.
>
> Likewise, for (2), I get:
>
> r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)
>
> empty,,,fields, , , like this
> [empty][fields][like][this]
>
> This time, the behavior is no different than the 'original' version.
I get the same results as the first version. Perhaps it wasn't escaped properly?
Yechezkel Mett
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk