Boost logo

Boost Users :

From: hallouina-ml_at_[hidden]
Date: 2007-11-26 11:22:55


Hello; I try to extract an url from a webpage and it's almostly done but completly unoptimised : Before I try with a regex iterator. But I don't understand the documentation. I past to many time on this way, so I try an other way : I get my webpage with libcurl then I replace all " " by a "\n" like that : string::size_type i = 0; while (( i = page_a_analyser.find(' ', i ) ) != (string::npos)) { page_a_analyser.replace(i++, 1, "\n" ); } then I apply the regex : boost::regex rexp(".*(http:\\/\\/.+)\"*.*"); and I get this result : http://www.nolife-tv.com/" http://www.nolife-tv.com"> http://www.nolife-tv.com/images/stories/noiz/1.jpg" http://www.nolife-tv.com/component/option,com_poll/task,results/id,16/Itemid,47/';" http://www.joomla.org" http://www.google-analytics.com/urchin.js" http://www.omniture.com and so on... I will cut and get only the url without the " or ' why this regex get the " with it? I put the close bracket before the " so why? I already try to do \\" rather than \" I try to do (\"|')" too to say " or ', but this doesn't work too... So I do an other way : I get my webpage with libcurl then I replace all " " by a "\n" then replace all " by \n then replace all ' by \n then I apply the regex And I should replace with 3 while rather than only one... because the 3 conditions in one while wasn't working : while ( (( i = page_a_analyser.find(' ', i ) ) != (string::npos)) or ( i = page_a_analyser.find('"', i ) ) != (string::npos) or ( i = page_a_analyser.find('\'', i ) ) != (string::npos) ) So how can I do to just improve the regex to extract the url? to do just something like : replace " " by "\n" then apply the right regex. I don't want to use a regex iterator again. regex iterator win again my patience... 3 day on it is enough for me. Thanks for your attention _____________________________________________________________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net