Boost logo

Boost Users :

From: Christian Henning (chhenning_at_[hidden])
Date: 2007-12-15 18:16:07


Thanks Larry.

On Dec 15, 2007 5:23 PM, Larry <lknain_at_[hidden]> wrote:
> This was more of brute force approach that I did when I first started using
> Boost a few years ago. There may be (probably) better and/or more efficient
> ways to do it: It was sufficient for what I was doing.
>
> //-----------------------------------------------------------------
> // Using tokenizer
>
> using namespace boost;
>
>
> typedef escaped_list_separator<char> CharTokens;
> typedef tokenizer<CharTokens> EscapedTokenizer;
> typedef tokenizer<CharTokens>::iterator EscapedIterator;
>
> CharTokens cs(",",",",boost::keep_empty_tokens);
> std::string str; // This has CSV input line
> EscapedIterator eti;
>
> EscapedTokenizer et(str,cs);
>
> for (eti = et.begin(); eti != et,end(); eti++) {
> if (*eti == ",") { // See if this is a separator
> field_number++;
> } else {
> // *eti points to a value which could be an empty field
> // field_number is the field in the list
> }
> }
>
>
> //-----------------------------------------------------------------
> // Using Spirit
> //
> // Result is a vector of items much list split() - including empty strings
> in the
> // vector for empty fields
> //
> // Probably could be used with any<>
>
> using namespace boost::spirit;
>
> char *plist_csv = new char[4096];
>
> rule<> list_csv, list_csv_item;
> std::vector<std::string> vec_item, vec_list;
> parse_info<> result;
>
> list_csv_item =
> confix_p('\"', *c_escape_cha_p,'\"')
> | longest_d(real_p | int_p | *(alnum_p | ch_p('_')))
> ;
>
> list_csv =
> list_p(
> (!list_csv_item)[append(vec_item)],
> ',') [append(vec_list)]
> ;
>
> result = parse(plist_csv,list_csv);
>
> if (result.hit) // Got at least part
> if (result.full) {
> // All present
> }
> }
>
>
> ----- Original Message -----
> From: "Christian Henning" <chhenning_at_[hidden]>
> Newsgroups: gmane.comp.lib.boost.user
> To: <boost-users_at_[hidden]>
>
> Sent: Saturday, December 15, 2007 1:38 PM
> Subject: Re: [boost-users] tokenizer vs string algorithm split.
>
>
> > Hi Larry, can you share the code which can handle empty fields?
> >
> > Thanks,
> > Christian
> >
> > On Dec 15, 2007 1:32 PM, Larry <lknain_at_[hidden]> wrote:
> >> If your CSV has empty fields (e.g., data,data,,data.....) the only way I
> >> found to handle the empty field was to handle the separators yourself
> >> with
> >> the tokenizer otherwise the tokenizer would skip the field (a la
> >> strtok()).
> >>
> >> For CSVs I tried Spirit and came up with a scheme (with lots of help I
> >> would
> >> add) that seemed to work. Not many lines of code. It takes more time than
> >> I
> >> was interested in spending to figure it out.
> >>
> >> Larry
> >>
> >> ----- Original Message -----
> >> From: "Edward Diener" <eldiener_at_[hidden]>
> >> Newsgroups: gmane.comp.lib.boost.user
> >> To: <boost-users_at_[hidden]>
> >> Sent: Saturday, December 15, 2007 9:44 AM
> >> Subject: Re: [boost-users] tokenizer vs string algorithm split.
> >>
> >>
> >> Bill Buklis wrote:
> >> > This may not matter for the CSV file you're parsing, but at least for a
> >> > more general solution for CSV processing, you'd also have to handle
> >> > fields that are surrounded by quotes and may even contain embedded
> >> > commas. I don't know if split or tokenizer can handle that.
> >>
> >> Tokenizer's escaped_list_separator handles quotes and embedded commas
> >> properly.
> >>
> >>
> >> _______________________________________________
> >> Boost-users mailing list
> >> Boost-users_at_[hidden]
> >> http://lists.boost.org/mailman/listinfo.cgi/boost-users
> >>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net