Boost logo

Boost Users :

Subject: [Boost-users] BOOST Spirit Qi and lexer parsing of tokens
From: Howard, Tony (tony.howard_at_[hidden])
Date: 2014-02-07 13:54:36


Hi,
I am starting a project that will use spriti Qi and a lexer. I am having difficulty getting something simple to parse correctly without having to add states and just want to make sure I am going about it the right way.

I have a label that should parse the following example (where the label starts/ends with a single quote):
'this is my BOOLEAN 123label 12as'
And then this label will be followed by zero or more blanks and then a number enclosed in parenthesis:
( 123 )

There are a number of arbitrary blanks in between the number and the enclosing parenthesis.

I am ignoring blanks and whitespace in the default state of the lexer (assume blanks and white spaces are defined in the patterns referenced):
this->self +=
                     lex::token_def<>("{BLANK}")[ lex::_pass = lex::pass_flags::pass_ignore ];
              this->self +=
                     lex::token_def<>("{WS}")[ lex::_pass = lex::pass_flags::pass_ignore ];

The first problem I encountered was that if I ignored blanks in the initial state and tried to parse a label that contains blanks that I don't want to ignore, this was not possible. So I had to create a label state, to which the blank token is added, and am switching to this state initially using in_state[...] so that when we are done with the label, we return to the initial state, where blanks are once again ignored. This worked.

The problem then came with the digits. Since the digits were already recognized in the label state, I could not also recognize them in the initial state. So I added a digit state, where the digit token is now defined and the label state has to transition to this state while parsing between letter, symbols and digits (and whatever else I decide is allowed to make up my label), and after I get out of the label state, my rule parses a parenthesis, goes into the digit state, and then parses the closing parenthesis. This works correctly for what I need now.

The issue I am having is looking ahead. What if a rule needs lower case letters only? I will have to create a state for it, remove it from the label state, etc. I will have to do this for any core patterns, which to me seems like there should be an easier way to do this.

Any suggestions, comments, concerns? I have code snippets if my approach is not clear.

Thanks for your help.

Tony.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net