Boost logo

Boost Users :

From: hallouina-ml_at_[hidden]
Date: 2007-11-26 12:07:21


No, I don't see this example, I was reading the documentation and example about regex iterator and not regex token iterator. I search on other website but don't find this example. I will study it. Thanks John for always anwsering, and always fast answering. I was completly discouraged. Thanks again! ----- Message d'origine ---- De : John Maddock <john_at_[hidden]> À : boost-users_at_[hidden] Envoyé le : Lundi, 26 Novembre 2007, 17h50mn 31s Objet : Re: [Boost-users] extract url with boost::regex hallouina-ml_at_[hidden] wrote: > Hello; > > I try to extract an url from a webpage and it's almostly done but > completly unoptimised : > > Before I try with a regex iterator. But I don't understand the > documentation. :-( Did you see this example:http://www.boost.org/libs/regex/example/snippets/regex_token_iterator_eg_2.cpp It does exactly what you want - it exacts all the URL's from a HTML file. > boost::regex rexp(".*(http:\\/\\/.+)\"*.*"); > > > and I get this result : > > http://www.nolife-tv.com/" > http://www.nolife-tv.com"> > http://www.nolife-tv.com/images/stories/noiz/1.jpg" > http://www.nolife-tv.com/component/option,com_poll/task,results/id,16/Itemid,47/';" > http://www.joomla.org" > http://www.google-analytics.com/urchin.js" > http://www.omniture.com > > and so on... > > I will cut and get only the url without the " or ' > why this regex get the " with it? I put the close bracket before the > " so why? I already try to do \\" rather than \" Because the .* on the end of the expression will match whatever text follows the ", the grouping construct (...) spits out a *sub-expression* which you can access via the match_results::operator[] or match_results::str(i) methods. HTH, John. [...] _____________________________________________________________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net