Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost.Spirit's Greedy Fundamental Type Parsers
From: Daniel Hofmann (daniel_at_[hidden])
Date: 2016-04-01 03:07:30


On 04/01/2016 12:19 AM, Lee Clagett wrote:
> On Thu, 31 Mar 2016 23:02:05 +0200
> Daniel Hofmann <daniel_at_[hidden]> wrote:
>
>> Suppose I want to parse a list of ";"-separated floating point pairs
>> with "," being the pair separator as in "1,2;3,4". Following this list
>> comes a string literal representing a file extension, such as ".txt".
>>
>> Therefore what I want to successfully parse input like the following:
>>
>> 1,2;3,4.txt
>>
>> (For the record, the input could also be 1.1,2.2;3.3,4.4.txt)
>>
>>
>> The parser I came up with is the 1:1 translation of above's
>> description into the Spirit DSL and shows Spirit's expressive power:
>>
>> ((double_ % ",") % ";") >> ".txt"
>>
>> Unfortunately, the parser fails on the input with the integral values
>> above. Why? Because the fundamental parser for double_ greedily
>> matches on the "4." in "4.txt". Changing the "4" to "4.0" as in
>>
>> 1,2;3,4.0.txt
>>
>> parses successfully (but is not an option as it requires the user to
>> always add a trailing ".0" in case the last digit is integral.
>>
>>
>> I read about Spirit's DSL mapping to Parsing Expression Grammar (PEG)
>> with the choice operator | being evaluated in order. So the next
>> logical step for me was to try making use of it and adapting the
>> parser:
>>
>> (((int_ | double_) % ",") % ";") >> ".txt"
>>
>> which works on
>>
>> 1,2;3,4.txt
>>
>> but no longer on
>>
>> 1,2;3,4.0.txt
>>
>> Is there a way to adapt the parser to handle both cases?
>>
>>
>> I asked this on IRC and got the answer to try a solution based on
>>
>> ((double_ >> ".") | (int_ >> ".")) >> "txt"
>>
>> but when I use use this to parse "4.txt" into a std::vector<double>
>> via
>>
>> parse(first, last, ((double_ >> ".") | (int_ >> ".")) >> "txt", into);
>>
>> the vector contains: {4, 4} and its size() is 2, which I can make no
>> sense of at all (but this may be a different problem).
>>
>
> That was me in IRC. I assumed you would be using `variant<double, int>`
> or `double` as your attribute type, and not `std::vector<double>`. If
> this is part of a larger expression and you need to use a std::vector
> for some reason look into the hold directive [0]:
>
> (hold[double_ >> "."] | (int_ >> ".")) >> "txt"
>
> The sequence operator will immediately call push_back if the left side
> expression (`double`) succeeds. `hold` creates a copy of the vector,
> and swaps iff everything in the directive returns true. If you use a
> `variant` or a `double` as your attribute, then the attribute is
> overwritten by `int_` and the `hold` is not needed.

I see, so parsers immediately push_back into the vector and in case of
failure the items remain in the vector, unless I'm using hold. This
perfectly explains what I'm seeing here.

> I am not sure why you want to use a `double` in this situation, but
>
> std::vector<unsigned> out;
> parse(first, last, (+(uint_ >> '.') >> "txt"), out);
>
> or
>
> unsigned one = 0;
> boost::optional<unsigned> two;
> parse(
> first, last,
> (uint_ >> '.' >> -(uint_ >> '.') >> "txt"),
> one, two);
>
> will prevent inputs that contain '-' or the various inputs that the
> real parser [1] accepts. uint_ [2] can also be specialized to have a
> min,max number of digits which might be useful to your situation.

I'm parsing into a std::vector<double> since I want both

1,2;3,4.txt

as well as

1.1,2.2;3.3,4.4.txt

to succeed. With a uint_ based parser as you suggest, I get a vector of
{1,1,..} for the second example, which does not represent the input or
lets me reconstruct it.

Looking at strict_real_policies<double> I was under the impression that
the default real policy should work for both inputs above, being able to
parse both inputs into a vector of {1.0, 2.0, 3.0, 4.0} and {1.1, 2.2,
3.3, 4.4} respectively.

> Lee
>
> [0]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/reference/directive/hold.html
> [1]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/reference/numeric/real.html
> [2]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/reference/numeric/uint.html
>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net