Boost logo

Boost Users :

Subject: Re: [Boost-users] Interested in parsing tools
From: Ramon F Herrera (ramon_at_[hidden])
Date: 2009-09-12 15:40:12


OvermindDL1 wrote:
>> One of my main current interests is parsing. Trying to decide among the
>> choices:
>>
>> - Regex
>> - Spirit
>> - Xpressive
>
> Depends on what you are wanting to parse. If you want to do, say, a
> search and replace in a file, Xpressive is best, if you want to parse
> data structures and you want the absolute best speed and a completely
> unambiguous grammar, Spirit2.1 for sure. Do not bother with Regex
> itself as Xpressive can do everything Regex can, but more and better.
>

Thanks so much!, OvermindDL1...

Allow me to describe my target data. I initially had a bunch of files
with lines like this:

Variable Name = Variable Value

These are some examples:

--------------------------------------------------------------------
My Favorite Baseball Player = George Herman "Babe" Ruth

What did you do on Christmas = I rested, computed the % mortgage and
visited my brother + sister.

(the above should be in a single line)

Favorite Curse = That umpire is a #&*%!
--------------------------------------------------------------------

I quickly solved the above parsing with Regex like this:

string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)";
char equal_sign = '=';
string value = "(.+)";
assignment = variable + equal_sign + value;

After retrieving the LHS and the RHS I store them for subsequent use in
a map<string, string> data structure.

My data, however, just became a bit more challenging. It is now divided
into blocks:

[Unique ID 1]
Variable Name = Variable Value
Variable Name = Variable Value
Variable Name = Variable Value

[Unique ID 2]
Variable Name = Variable Value
Variable Name = Variable Value
Variable Name = Variable Value

[Unique ID 3]
Variable Name = Variable Value
Variable Name = Variable Value
Variable Name = Variable Value

(etc.)

Again, I would like to store the new format in a map, using the Unique
ID as key to retrieve the block of lines underneath each ID.

At this stage, I am wondering whether to continue using true and tried
(and learned!) Regex, or get my feet wet into more powerful tools, such
as the one recommended by Overmind (Xpressive).

How does Xpressive compare with ANTLR? I am torn between them.

TIA,

-Ramon


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net