Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Yechezkel Mett (ymett.on.boost_at_[hidden])
Date: 2011-02-09 06:41:32

Next message: Boris Brodski: "[boost] Library proposal: Boost.JniTools - The easy way to call Java methods from C++ (build on top of JNI)"
Previous message: Artyom: "Re: [boost] Subject: Formal Review of Proposed Boost.Process library starts tomorrow"
In reply to: Max: "[boost] [Tokenizer]Usage and documentation"
Next in thread: Max: "Re: [boost] [Tokenizer]Usage and documentation"
Reply: Max: "Re: [boost] [Tokenizer]Usage and documentation"

On Tue, Feb 8, 2011 at 3:13 PM, Max <more4less_at_[hidden]> wrote:
> I'm using boost::tokenizer to do some simple parsing of data file in a
> format specified by the following rules:
>
> - One record of several fields in a single line
>
> - Adjacent data fields in a record separated by space char's(space
> or tab), with or without ","
>
> - String without space(s), with or without quotation marks
>
> - String with space(s), with quotation marks
>
>
> One example of a 4-field-per-record file is like:
>
> "string 2" 3 4 5 4.3
>
> "String", 2, 3.04 4 3
>
> AnyOtherText, 2, 3.04 4 3

I normally use boost.regex's regex_token_iterator for this sort of task.
Try the following regex:

"([^"]*)"|(?:^|[[:space:],])+([^[:space:],]+)(?:$|[[:space:],])+

and tell regex_token_iterator to extract matches 1 and 2.

The above regex has a couple of quirks: "a""b" will be taken as two
fields, "a" and "b". a,,b will be taken as two fields, not three.

To read the file line by line, simply use std::getline.

Yechezkel Mett

Next message: Boris Brodski: "[boost] Library proposal: Boost.JniTools - The easy way to call Java methods from C++ (build on top of JNI)"
Previous message: Artyom: "Re: [boost] Subject: Formal Review of Proposed Boost.Process library starts tomorrow"
In reply to: Max: "[boost] [Tokenizer]Usage and documentation"
Next in thread: Max: "Re: [boost] [Tokenizer]Usage and documentation"
Reply: Max: "Re: [boost] [Tokenizer]Usage and documentation"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk