Thank you John, right the difference is in the header. Your regular expression however doesn't make sense to me though, where is the part that matchs the sub-total header?

On 8/2/06, John Maddock <john@johnmaddock.co.uk> wrote:
Winson Yung wrote:
>> Hello all, I have a follow table:
>>
>> OPERATING REVENUES:
>> publishing
>> $   42,419         $   44,754         $  46,203
>> collegiate marketing and production services
>> 97
>> ASSOCIATION MANAGEMENT SERVICES
>>
>> 16
>> wireless
>> 8,883              8,129               7,507
>>
>> 51,302             52,883             53,823
>>
>> All I know is that OPERATING REVENUES: will be always there,
>> question is how to write a regular expression to capture the total
>> (which is 51,302 here) There might be more/less than four rows in
>> the table. Would really appreciated if anyone has good suggestion on
>> this.

I'm assuming that the difference between the sub-totals and the totals is
that the sub-totals always have a header?  If so then off the top of my head
(caution untried!) something like:

"OPERATING\\s+REVENUES:[[:blank:]]*[\r\n]+"  // tag line
"(?:"                                        // group sub-totals
"\\s*[^\\d$][^\r\n]*[\r\n]+[^\r\n]+[\r\n]+" // sub-total=two lines
")*"                                         // close group and repeat
"\\s+\\$?([\\d,.)+"                          // capture total

HTH, John.

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users