Boost logo

Boost Users :

From: Larry (lknain_at_[hidden])
Date: 2007-12-15 17:23:23


This was more of brute force approach that I did when I first started using
Boost a few years ago. There may be (probably) better and/or more efficient
ways to do it: It was sufficient for what I was doing.

//-----------------------------------------------------------------
// Using tokenizer

using namespace boost;

typedef escaped_list_separator<char> CharTokens;
typedef tokenizer<CharTokens> EscapedTokenizer;
typedef tokenizer<CharTokens>::iterator EscapedIterator;

CharTokens cs(",",",",boost::keep_empty_tokens);
std::string str; // This has CSV input line
EscapedIterator eti;

EscapedTokenizer et(str,cs);

for (eti = et.begin(); eti != et,end(); eti++) {
    if (*eti == ",") { // See if this is a separator
        field_number++;
    } else {
            // *eti points to a value which could be an empty field
            // field_number is the field in the list
    }
}

//-----------------------------------------------------------------
// Using Spirit
//
// Result is a vector of items much list split() - including empty strings
in the
// vector for empty fields
//
// Probably could be used with any<>

using namespace boost::spirit;

char *plist_csv = new char[4096];

rule<> list_csv, list_csv_item;
std::vector<std::string> vec_item, vec_list;
parse_info<> result;

list_csv_item =
       confix_p('\"', *c_escape_cha_p,'\"')
       | longest_d(real_p | int_p | *(alnum_p | ch_p('_')))
    ;

list_csv =
        list_p(
            (!list_csv_item)[append(vec_item)],
            ',') [append(vec_list)]
    ;

result = parse(plist_csv,list_csv);

if (result.hit) // Got at least part
    if (result.full) {
        // All present
    }
}

----- Original Message -----
From: "Christian Henning" <chhenning_at_[hidden]>
Newsgroups: gmane.comp.lib.boost.user
To: <boost-users_at_[hidden]>
Sent: Saturday, December 15, 2007 1:38 PM
Subject: Re: [boost-users] tokenizer vs string algorithm split.

> Hi Larry, can you share the code which can handle empty fields?
>
> Thanks,
> Christian
>
> On Dec 15, 2007 1:32 PM, Larry <lknain_at_[hidden]> wrote:
>> If your CSV has empty fields (e.g., data,data,,data.....) the only way I
>> found to handle the empty field was to handle the separators yourself
>> with
>> the tokenizer otherwise the tokenizer would skip the field (a la
>> strtok()).
>>
>> For CSVs I tried Spirit and came up with a scheme (with lots of help I
>> would
>> add) that seemed to work. Not many lines of code. It takes more time than
>> I
>> was interested in spending to figure it out.
>>
>> Larry
>>
>> ----- Original Message -----
>> From: "Edward Diener" <eldiener_at_[hidden]>
>> Newsgroups: gmane.comp.lib.boost.user
>> To: <boost-users_at_[hidden]>
>> Sent: Saturday, December 15, 2007 9:44 AM
>> Subject: Re: [boost-users] tokenizer vs string algorithm split.
>>
>>
>> Bill Buklis wrote:
>> > This may not matter for the CSV file you're parsing, but at least for a
>> > more general solution for CSV processing, you'd also have to handle
>> > fields that are surrounded by quotes and may even contain embedded
>> > commas. I don't know if split or tokenizer can handle that.
>>
>> Tokenizer's escaped_list_separator handles quotes and embedded commas
>> properly.
>>
>>
>> _______________________________________________
>> Boost-users mailing list
>> Boost-users_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net