Boost logo

Boost Users :

Subject: [Boost-users] Batch job processing with Boost Spirit parser
From: Michael Levine (shmuel.levine_at_[hidden])
Date: 2014-09-04 16:27:44


I have been using Boost Sprit as a parser for a project that I have been
working on lately. At this point, I have been trying to expand the
software and, in doing so, have had the nagging feeling that there is
something wrong with my overall design. In the best case, I am not
making the best use of my tools, and in the worst case, I am concerned
that the design/code is becoming overly brittle. As an aside, I don’t
have a huge amount of programming experience with this type of application.

A word of background / context :

Essentially, the program performs batch ‘jobs’ which are specified in a
text file (not dissimilar in that sense from a scheduler like Condor).
Spirit parses these ‘Job Description’ files into ‘specification objects’
(my vocabulary) that are used with a builder pattern and factories to
create the appropriate objects. I have some concerns about this which I
will describe later.

The job description (henceforth: JD) file should have 3 parts:
1. Data
2. Tools
3. Operations

The Data section is just a list of data that is to be operated on. At
this stage, this is just a vector of std::pair<std::string, std::string>
containing an identifier and the path of the file. The ‘Tools’ and
‘Operations’ sections are likely composed of nested specifications (the
resulting objects use either a decorator pattern or composite pattern –
depending on the type of tool) – which was the main reason that I
started using Spirit altogether.

My question is really a request for some guidance – to better utilize
the tools available:

I have just received the requirement for section #3 (Operations), to be
included in the JD file (previously it was assumed that this would be
provided in a different manner). So – at this time, I have a working
parser for the Data and Tools portions. I have concerns with the
‘Tools’ portion that I would like to correct and not duplicate in the
‘Operations’ section that I am to be working on next.

At present, I am parsing the JD file mostly into strings, and vectors of
boost::variant<int,double> -- the latter being a list of parameters in a
completely arbitrarily imposed order. As I’ve previously mentioned, I
have a few problems with this approach.
- The parameters are required to be input in an arbitrary order.
- Different ‘Tool’s and ‘Operation’s have different parameters that are
required and/or optional.

For the past weeks (I only work on this project on a very part-time
basis), I have been going in circles trying to figure out a better way
to do this. I have a strong suspicion that Fusion can be used for this.
  I am also concerned about over-complicating the design, but am weary
of leaving the design too simplistic. I.e., I know that I can treat the
parameters as a std::pair<std::string, boost::variant<int, double>> and
then parse key-value pairs with a given delimiter. I am just not
convinced that this is the best way to do this.

I am able to use any Boost Library, and have no restrictions about
compilers (I’m using a recent version of Clang, primarily, right now).

The following is a selection of the data structures and Spirit grammar
that I am using here:

struct Tool_Spec;
typedef std::vector<boost::variant<int, double> > Tool_Options_t;
typedef std::vector<Tool_Spec > Children_Tool_t;
typedef std::pair<std::string, std::string> Data_Spec;

struct Job_Request {
        Data_Spec data_spec;
        Tool_Spec model_spec;
        Operation_Spec operation_spec;
        boost::optional<std::string> description;
};

struct Tool_Spec{
             std::string type;
             std::string data_designation;
             Tool_Options_t; options;
             Children_Tool_t children;
             boost::optional<std::string> designation;
         };
                
struct Operation_Spec{
/*
   Unknown at this time. Need help
   */
};

                
Datafile %= lit("@START")
>> *Data_Description
>> Tool_Description
>> Operation_Description
>> lit("@END")
        ;
        
Data_Description %= lit('%')
>> Datasource
>> lit(';')
        ;

Datasource =
        Designator
>> lit(':')
>> Designator
        ;
        
Designator %= +(char_("0-9a-zA-Z/._") | char_('-') );
Comment_Designator %= +(char_("0-9a-zA-Z/._, ()") | char_('-'));

Tool_Description %=
        Designator
>> ':'
>> ('@' >> Designator)
>> '['
>> +Options
>> ']'
>> -('{' >> *Child_Tool >> '}')
>> -qi::lexeme[Comment_Designator]
>> ';'
        ;

Child_Tool %=
        Designator
>> ':'
>> ('@' >> Designator)
>> '['
>> +Options
>> ']'
>> -('{' >> *Child_Tool >> '}')
>> -qi::lexeme[Comment_Designator]
>> ';'
        ;
        
Options %= (int_ | double_ ) % '|';


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net