Boost logo

Boost Users :

From: Mike Marchywka (marchywka_at_[hidden])
Date: 2007-10-04 14:32:06


>What is the text you are matching against? If you can give me a concrete
>example I can test it here, but "it hangs" isn't very useful I'm afraid ;-)
>

[ sorry if editing is awful, I kept simlifying and now believe I have a
tractable
test case- I didn't expect data dependence earlier. I now have a single
regex
that works of fails depending on a simple data change as shown below.
And, again it would run with 10's of longer data strings against 1000's of
SIMPLER
regex;s and match exactly with diffs against greta results ]

Thanks. If you have an easy way to test this, the scenario is as follows:

I have a file containing multiple strings ( gene sequences, FWIW) and a set
of
rules ( regex's ) that describe interesting features in the string. The
normal sequence
is to apply the entire rule set to each string and return a vector of hits
per-string.
I played around with one of these files to find the simplest thing I could
find to
cause the error.
In these traces, line 114 is the query and 115 is the sample ( no
whitespace/crlf etc).
Unlike my first example with the assertion support, this seems to make it
through all
the rules at least once

This hangs ( I thought it may be a repetition problem but reliably hits on
this data/regex combo first pass):

$ $progpath/rules_annotater -clean -boost -doall -fastas q_fasta -debug
-rules
$progpath/boost_edit_rulesx > asdf

myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
myboost.cpp115 in
GAGATATTCACCTCTCATTGCCTTTTCCAGAGGTTGTTGAACTTAGTGGCCTGAGCATTTTA
TCTGCAAAATGACTAGCAATTTTTTTTTAAGTTTCAGGCTTTTTTAATGCCCTAAATACAGTTGATCCATTACCGAGTGT
GTTACATGCATAGGAATTTACTGATCTTTTCTTTTCCCCCTAGCTAGTTTTAAAGTTACTGAGCATAACGAGCTTTAAAA
ATTCTTCAGAATACAAATAAATGAATAGATAAAAGACTACCTCCATTTGATAAATCATTCAAGAAAAAGAAAAAAAAACT
TGAGCAAGCTAAGAAAGTCATTAACAGCCATATTTCTGATGGAACTAATGTxGATACCTACTCAAGCTAxCACTxGAATC
TAATAATCTGTGAGAGAAGAAATGGGAAAAGGTATGAAAGC
myboost.cpp121 looking for subexpr 0
myboost.cpp139

This DOESNOT hang ( note that it depends on removing the "X"'s):
$ $progpath/rules_annotater -clean -boost -doall -fastas q_fasta -debug
-rules
$progpath/boost_edit_rulesx > asdf
myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
myboost.cpp115 in
GAGATATTCACCTCTCATTGCCTTTTCCAGAGGTTGTTGAACTTAGTGGCCTGAGCATTTTA
TCTGCAAAATGACTAGCAATTTTTTTTTAAGTTTCAGGCTTTTTTAATGCCCTAAATACAGTTGATCCATTACCGAGTGT
GTTACATGCATAGGAATTTACTGATCTTTTCTTTTCCCCCTAGCTAGTTTTAAAGTTACTGAGCATAACGAGCTTTAAAA
ATTCTTCAGAATACAAATAAATGAATAGATAAAAGACTACCTCCATTTGATAAATCATTCAAGAAAAAGAAAAAAAAACT
TGAGCAAGCTAAGAAAGTCATTAACAGCCATATTTCTGATGGAACTAATGTxGATACCTACTxCAAGCTAxCACTxGAAT
CTAATAATCTGTGAGAGAAGAAATGGGAAAAGGTATGAAAGC
myboost.cpp121 looking for subexpr 0
myboost.cpp139

Administrator_at_TESTBED01 /cygdrive/e/new/temp/canis/known/grade_R/misc_tgf
$

All of these are char*, not std::string FWIW.

boost::regex expression(query);

   boost::match_results<const ChTy* > what;
   boost::match_flag_type flags = boost::match_default;
while(regex_search(start, end, what, expression, flags))

>From: "John Maddock" <john_at_[hidden]>
>Reply-To: boost-users_at_[hidden]
>To: <boost-users_at_[hidden]>
>Subject: Re: [Boost-users] follow up on regex questions
>Date: Thu, 4 Oct 2007 17:37:22 +0100
>
>Mike Marchywka wrote:
> > Hi,
> > Thanks for your help in the past. I would normally drop the issue at
> > this point
> > until I get my build environment cleaned up
> > (" My build is messed up, I haven't read the documentation. What is
> > wrong with YOUR library?" LOL).
> > but I do have one more question which I believe is related to boost
> > regex processing.
> > If someone has a known good regex test program or can point to an
> > obvious problem
> > it may be helpful.
>
>You mean libs/regex/test/regress/*.cpp ?
>
>It would be a good idea to build and run this to verify the sanity of your
>setup at least: I still have a suspision that the binaries you are using
>are
>not compatible with your build options or regex headers, but I can't be
>sure.
>
> > Again, this code seems to work with Microsoft's
> > greta and boost gives identical results on a longer list of SIMPLER
> > regexes so I reasonably believe that the problem is due to handling
> > of more complicated expression ( One caveat, to be complete, is that
> > greta did seem to return some spurious
> > results but they are easily filter programmatically, things like
> > negative location,
> > but the plausible ones that I have checked manually are right).
> > However, on this sequence of regexes (regexi?) I get either an abort
> > OR the program hangs
> > later on non-sensical execution (I know, "Gee, you have a build
> > problem and the stack is messed up?").
> >
> > myboost.cpp114 (GU.*?TACTAAC.{20,40}AG|^)(.*?)(GU.*?TACTAAC.{20,40}AG)
> > myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
> > myboost.cpp114 TATAA.*?ATAAA
> > myboost.cpp114 (GU.*?TACTAAC.{20,40}AG|^)(.*?)(GU.*?TACTAAC.{20,40}AG)
> > myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
> >
> > ( progam hangs in my code or had been core dumping in boost::regex )
>
>What is the text you are matching against? If you can give me a concrete
>example I can test it here, but "it hangs" isn't very useful I'm afraid ;-)
>
>John.
>
>_______________________________________________
>Boost-users mailing list
>Boost-users_at_[hidden]
>http://lists.boost.org/mailman/listinfo.cgi/boost-users

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net