Boost logo

Boost Users :

Subject: Re: [Boost-users] [spirit] parsing vsv files with embedded quotes/line breaks
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-09-24 22:53:33


On Thu, Sep 24, 2009 at 7:44 PM, Sean Farrow
<sean.farrow_at_[hidden]> wrote:
> Hi:
>
> I know there is a spirit example to parse csv or any list-separated file.
> What I need to do is to be able to parse files with embedded “,” characters,
> quotes and line breaks in.
>
> How would i go about doing this?

Usually better to ask on the Spirit list, but perhaps something like this:
  *((('"'>>*(('\'>>char_)|~char_('"'))>>)'"'|~(char_(',')|eol))%',')%eol

Will parse into vector< vector< string > >

That would parse this:
1,2,3,4
hi,1,y,o,,8
hi,1,y,o,\n,8
hi,1,"y,o",,8
hi,"multi
line and embedded\"quote",1

Into this:
vector< vector< string > >(
  vector<string>("1","2","3","4"),
  vector<string>("hi","1","y","o","","8"),
  vector<string>("hi","1","y","o","\n","8"),
  vector<string>("hi","1","y,o","","8"),
  vector<string>("hi","multi\nline and embedded\"quote","1"),
)

Basically it parses characters separated by commas, with end of line
terminators being an end of a row, unless surrounded with quotes where
everything is taken verbatim, except \" which is taken as a " (well,
\n where n is *any* character is n, so \\ will be \ and \g will be g
and so forth...
Simple, but it works.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net