Boost logo

Boost Users :

Subject: Re: [Boost-users] Interested in parsing tools
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-09-12 17:59:25


On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera <ramon_at_[hidden]> wrote:
> OvermindDL1 wrote:
>>>
>>> One of my main current interests is parsing. Trying to decide among the
>>> choices:
>>>
>>> - Regex
>>> - Spirit
>>> - Xpressive
>>
>> Depends on what you are wanting to parse.  If you want to do, say, a
>> search and replace in a file, Xpressive is best, if you want to parse
>> data structures and you want the absolute best speed and a completely
>> unambiguous grammar, Spirit2.1 for sure.  Do not bother with Regex
>> itself as Xpressive can do everything Regex can, but more and better.
>>
>
> Thanks so much!, OvermindDL1...
>
> Allow me to describe my target data. I initially had a bunch of files with
> lines like this:
>
> Variable Name = Variable Value
>
> These are some examples:
>
> --------------------------------------------------------------------
> My Favorite Baseball Player = George Herman "Babe" Ruth
>
> What did you do on Christmas = I rested, computed the % mortgage and visited
> my brother + sister.
>
> (the above should be in a single line)
>
> Favorite Curse = That umpire is a #&*%!
> --------------------------------------------------------------------
>
> I quickly solved the above parsing with Regex like this:
>
> string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)";
> char equal_sign = '=';
> string value    = "(.+)";
> assignment      = variable + equal_sign + value;
>
> After retrieving the LHS and the RHS I store them for subsequent use in a
> map<string, string> data structure.
>
> My data, however, just became a bit more challenging. It is now divided into
> blocks:
>
> [Unique ID 1]
> Variable Name = Variable Value
> Variable Name = Variable Value
> Variable Name = Variable Value
>
> [Unique ID 2]
> Variable Name = Variable Value
> Variable Name = Variable Value
> Variable Name = Variable Value
>
> [Unique ID 3]
> Variable Name = Variable Value
> Variable Name = Variable Value
> Variable Name = Variable Value
>
> (etc.)
>
> Again, I would like to store the new format in a map, using the Unique ID as
> key to retrieve the block of lines underneath each ID.

Actually, that kind of stuff is very easy to do in Spirit2.1 (in the
boost trunk or Boost 1.41), it can auto-fill your structures and
everything, and it is very fast.

On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera <ramon_at_[hidden]> wrote:
> At this stage, I am wondering whether to continue using true and tried (and
> learned!) Regex, or get my feet wet into more powerful tools, such as the
> one recommended by Overmind (Xpressive).

As stated, Xpressive can do all Regex can do, but you can also do
static regex's (compiled by the C++ grammar, much faster then a string
regex), but Spirit2.1 would still be a lot faster overall (it has been
timed against a lot of things, and it blows even Xpressive's static
parsers away).

On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera <ramon_at_[hidden]> wrote:
> How does Xpressive compare with ANTLR? I am torn between them.

Xpressive and ANTLR are two different things. ANTLR is like a
not-as-powerful-and-slower Spirit2.1, a full grammar parser, where
Xpressive is just a regex parser.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net