Boost logo

Boost :

From: Tommy McClung (tmcclung_at_[hidden])
Date: 2007-03-26 09:58:37


Here's my situation.

We've got some software that makes use of the boost regex libraries
and we've compiled and linked with the ICU libraries enabled. We
need this for utf-8 support.

Our platforms are Windows (XP and Vista) and OS X.

I have a regular expression that is parsing an html page that is
utf-8 encoded and it's a rather complex expression, but I've made
sure to add anchors such that I don't run into catastrophic
backtraking. Another note, I'm using u32regex_search and my flags
are match_default | match_partial. I have to use match_partial
because the input data is long and I've seen memory exhausted even
when I've increased BOOST_REGEX_MAX_BLOCKS (I probably have more work
to do on the expression to reduce the complexity).

On Windows I have no problems. Partial matches are returned and I
eventually get full matches and my code runs great.

On OS X, u32regex_search returns and indicates it has found a partial
match (what[0].match == false). But what[0].second is set to the end
of my input string, so no further matching takes place and my loop
ends. This is very different behavior than what I'm seeing on Windows.

I know this email is vague, but I can provide any details that would
be helpful in solving this issue. Why the difference between Windows
and OS X, I've compiled both boost libraries with the same compiler
options on both platforms? Is it an ICU compile issue?

Thanks,

Tommy McClung
IMSafer, Inc. Founder
tmcclung_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk