Boost logo

Boost :

Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Max (more4less_at_[hidden])
Date: 2011-02-09 08:22:44


Thank you Yechezkel.

I've indeed tried with boost.Regex, on a slightly different path though - I
was using boost::regex_search instead.

One drawback of the regex approach, IMO, is I feel the code a little bit
rigid,
or lack of flexibility, or in any other words, it's not anything I feel it
should be -
even though I cannot actually tell in what respect.

Thanks for your yet another regex approach. I'm trying to rewrite your regex

"([^"]*)"|(?:^|[[:space:],])+([^[:space:],]+)(?:$|[[:space:],])+

In a form that I'm more familiar

"([^"]*)"|(?:^|[\s,])+([^\s,]+)(?:$|[\s,])+

But I still cannot understand it, after reading through
http://www.boost.org/doc/libs/1_45_0/libs/regex/doc/html/boost_regex/syntax/
perl_syntax.html
The part I could not interpret is:

^|[\s,]

And

$|[\s,]

:-( (this is not a part of the regex, part of my expression instead.)

Thanks.

Max

> -----Original Message-----
> From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]]
> On Behalf Of Yechezkel Mett
> Sent: Wednesday, February 09, 2011 7:42 PM
> To: boost_at_[hidden]
> Subject: Re: [boost] [Tokenizer]Usage and documentation
>
> On Tue, Feb 8, 2011 at 3:13 PM, Max <more4less_at_[hidden]> wrote:
> > I'm using boost::tokenizer to do some simple parsing of data file in a
> > format specified by the following rules:
> >
> > -          One record of several fields in a single line
> >
> > -          Adjacent data fields in a record separated by space
> char's(space
> > or tab), with or without ","
> >
> > -          String without space(s), with or without quotation marks
> >
> > -          String with space(s), with quotation marks
> >
> >
> > One example of a 4-field-per-record file is like:
> >
> > "string  2"   3  4        5  4.3
> >
> > "String",     2,  3.04    4  3
> >
> > AnyOtherText, 2,  3.04    4  3
>
> I normally use boost.regex's regex_token_iterator for this sort of task.
> Try the following regex:
>
> "([^"]*)"|(?:^|[[:space:],])+([^[:space:],]+)(?:$|[[:space:],])+
>
> and tell regex_token_iterator to extract matches 1 and 2.
>
> The above regex has a couple of quirks: "a""b" will be taken as two
> fields, "a" and "b". a,,b will be taken as two fields, not three.
>
> To read the file line by line, simply use std::getline.
>
> Yechezkel Mett
> _______________________________________________
> Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk