|
Boost : |
Subject: Re: [boost] Parsing commands with Spirit
From: Hartmut Kaiser (hartmut.kaiser_at_[hidden])
Date: 2010-02-02 09:01:51
> >> lit("command1") >> +char_[ref(my_str) = _1] >> lit("separator") >>
> >> +char_[ref(my_str2) = _1] |
> >> lit("command2) >> +alnum_[ref(my_str) = _1]
> >
> > typedef iterator_range<char const*> range_type;
> > typedef std::pair<range_type, range_type> result_type;
> > rule<char const*, result_type()> r =
> > "command1" >> raw[+(!lit("separator") >> char_)] >>
> "separator"
> >>
> > "command2" >> raw[+alnum];
> >
> > Does exactly as the above except it returns two pairs of pointers to
> the
> > arguments of your commands. Just call it as:
> >
> > char const* begin = ...;
> > char const* end = ...;
> > result_type rt;
> > parse(begin, end, r, rt);
> >
> > allowing you to access your pairs of pointers from rt.
> > Voila! No memory allocation at all!
>
> Awesome! Well now I *have to* test this. ;) - in the long run having a
> real parser will do us good.
>
> Then I have an additional question, if you mind, as I said the second
> problem we have is that my command parameter may contain the separator,
> we
> get around this by parsing right to left. This is not something we can
> do
> much about it, unless we encode all the commands parameters which is a
> thing we cannot realistically do.
>
> Let me give you an example:
>
> command1/aaa/bbb/ccc/ddd
>
> We currently extract "aaa/bbb/ccc" and "ddd".
>
> The way I want to do this in Spirit is extract "aaa/bbb/ccc/ddd" and
> manually extract "ddd" from it. The other possibility is to extract
> "aaa",
> "bbb", "ccc" and "ddd" and since I have a pair of pointers just merge
> the
> pair "aaa" => "ccc".
>
> What would be a more sensible approach to this?
Well, it all depends (as usual) on
a) whether you know how many separators you have overall
in this case you can build the grammar accordingly.
b) whether you already know where the end of the string is, or otherwise you
have to scan the string once to find the end anyways.
If you don't know the string length upfront (you have to scan it once to
find eos), then I'd remember the position of the last separator along the
way and parse the two parts separately. If you do know the eos without
scanning, you could do two parse steps as well: using reverse_iterators
rbegin/rend to recognize command2 from the end which gives you the last
separator and then using the begin/lastsep iterators to recognize command 1.
HTH
Regards Hartmut
---------------
Meet me at BoostCon
www.boostcon.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk