On Tue, Jan 29, 2013 at 2:24 PM, Neil Sutton <neilmsutton@gmail.com> wrote:
I am writing a very simple program that extracts numbers from a string. The numbers are actually lottery numbers.
So far, my program connects to a certain url and downloads a file that contains the latest lottery results. I have managed to reach the point where the barest amount of relevant data is contained in a std::string.

The data is in the following format - though of course the date and numbers vary:

26-Jan-2013,2,6,21,29,34,47,11,X,X

Note I am not interested - at this stage - in the last two numbers represented by X,X. I am only interested in the first seven numbers following the date.

So I figured that it should be easy to write a regular expression to match this pattern:

boost::regex pattern("\d\d\d\d,\\>(\\d{1,2})\\<,");
I do not know regex well enough to know whether or not a regex can provide the basis for the 'fastest' implementation (I know from some of my experiments, there can be an order of magnitude difference in performance between the fastest and slowest algorithms to do the same thing - subject to the caveat that they all satisfy the functional requirements correctly), but if the only consideration right now is to get it working, why not examine boost more thoroughly.   It has a tokenizer already (http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/) that, once you know how to use it, may eliminate the need for you to roll your own.  It also has a split function in the string algorithms library (http://www.boost.org/doc/libs/1_52_0/doc/html/string_algo.html).  In both cases, you'd just split your example string on the comma.  The first element so extracted would be your date, and the rest would be your numbers.

HTH

Ted