Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost.Spirit's Greedy Fundamental Type Parsers
From: Lee Clagett (forum_at_[hidden])
Date: 2016-03-31 18:19:49


On Thu, 31 Mar 2016 23:02:05 +0200
Daniel Hofmann <daniel_at_[hidden]> wrote:

> Suppose I want to parse a list of ";"-separated floating point pairs
> with "," being the pair separator as in "1,2;3,4". Following this list
> comes a string literal representing a file extension, such as ".txt".
>
> Therefore what I want to successfully parse input like the following:
>
> 1,2;3,4.txt
>
> (For the record, the input could also be 1.1,2.2;3.3,4.4.txt)
>
>
> The parser I came up with is the 1:1 translation of above's
> description into the Spirit DSL and shows Spirit's expressive power:
>
> ((double_ % ",") % ";") >> ".txt"
>
> Unfortunately, the parser fails on the input with the integral values
> above. Why? Because the fundamental parser for double_ greedily
> matches on the "4." in "4.txt". Changing the "4" to "4.0" as in
>
> 1,2;3,4.0.txt
>
> parses successfully (but is not an option as it requires the user to
> always add a trailing ".0" in case the last digit is integral.
>
>
> I read about Spirit's DSL mapping to Parsing Expression Grammar (PEG)
> with the choice operator | being evaluated in order. So the next
> logical step for me was to try making use of it and adapting the
> parser:
>
> (((int_ | double_) % ",") % ";") >> ".txt"
>
> which works on
>
> 1,2;3,4.txt
>
> but no longer on
>
> 1,2;3,4.0.txt
>
> Is there a way to adapt the parser to handle both cases?
>
>
> I asked this on IRC and got the answer to try a solution based on
>
> ((double_ >> ".") | (int_ >> ".")) >> "txt"
>
> but when I use use this to parse "4.txt" into a std::vector<double>
> via
>
> parse(first, last, ((double_ >> ".") | (int_ >> ".")) >> "txt", into);
>
> the vector contains: {4, 4} and its size() is 2, which I can make no
> sense of at all (but this may be a different problem).
>

That was me in IRC. I assumed you would be using `variant<double, int>`
or `double` as your attribute type, and not `std::vector<double>`. If
this is part of a larger expression and you need to use a std::vector
for some reason look into the hold directive [0]:

    (hold[double_ >> "."] | (int_ >> ".")) >> "txt"

The sequence operator will immediately call push_back if the left side
expression (`double`) succeeds. `hold` creates a copy of the vector,
and swaps iff everything in the directive returns true. If you use a
`variant` or a `double` as your attribute, then the attribute is
overwritten by `int_` and the `hold` is not needed.

I am not sure why you want to use a `double` in this situation, but

   std::vector<unsigned> out;
   parse(first, last, (+(uint_ >> '.') >> "txt"), out);

or

   unsigned one = 0;
   boost::optional<unsigned> two;
   parse(
       first, last,
       (uint_ >> '.' >> -(uint_ >> '.') >> "txt"),
       one, two);

will prevent inputs that contain '-' or the various inputs that the
real parser [1] accepts. uint_ [2] can also be specialized to have a
min,max number of digits which might be useful to your situation.

Lee

[0]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/reference/directive/hold.html
[1]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/reference/numeric/real.html
[2]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/reference/numeric/uint.html


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net