Boost logo

Boost Users :

Subject: Re: [Boost-users] [Spirit-general] Pattern matching with boost
From: Tkil (tkil_at_[hidden])
Date: 2011-11-10 01:59:53


On Nov 9, 2011, at 21:49, Alec Taylor wrote:

> Here is a really simple explanation I just figured out to explain the
> problem I am trying to solve:
>
> std::string s1=garbagetext1+number1+name1+garbagetext4;
> std::string s3=garbagetext2+(number1+2)+name1+garbagetext5;
> std::string s5=garbagetext3+(number1+4)+name1+garbagetext6;
>
> If this pattern is found:
>
> return s1.substr(number1+name1);

Ok, now you've changed the requirements again: every previous message talked about four consecutive messages.

> How can I do this using boost [or other] libraries?

Have you even tried my code?

   http://article.gmane.org/gmane.comp.lib.boost.user/71262

It seems a bit rude to ask for suggestions and then [apparently] ignore them.

If it doesn't work, do you understand it well enough that you could try to alter it? (Hint: look at str_pat, although see point [2] below)

If you don't understand it, what don't you understand? I'll be happy to try to make it more obvious. I'll be less happy trying to help you further, when you don't seem able to provide a reasonable and consistent set of requirements.

Some corner cases to think about:

1. Can you distinguish "garbage text" from a "name"? If so, how? Character set used? Spaces? Predefined set of names?

2. Can you get mulitple candidates in each string? E.g.,

      s1 = "foo 1 apple 2 pear bar";
      s2 = "baz 3 orange 3 pear quux";

3. Is it 4 strings, or 3 strings (skipping intermediate strings), or...?

4. How large are these strings? If they're particularly large, then efficiency might have to trump elegance/readability. "particularly large" varies with time; for modern hardware, I wouldn't start worrying until the strings are ~MiB each, depending on how many sets of 3 (or 4 or 5 or 2 or...) we have to match.

Finally, try "test driven development":

A. Provide one or more sets of strings that are supposed to match, and what the correct output would be;

B. Provide as many sets as you can think of that *shouldn't* match. Look for corner cases: empty strings, repetitions, off-by-one errors, "name" matching but number not in sequence, in sequence but names not matching, partial matches, upper/lower case, etc.

At this point, I feel that you have a handful of suggestions, each of which is "correct" for some interpretation of the requirements you've given. But until you've gone through those solutions and determined what works (and what doesn't), it's not clear that we can help you much more.

That is, this sounds like a case where you need to sit down and make your requirements a *lot* more stringent. It doesn't have much to do with which library you use.

Best regards,
Tony

p.s. Apologies if this message is not formatted well -- this is not my normal/preferred mail interface.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net