Hi folks,<br><br>Sorry for posting a question unrelated to boost. I know pretty well that boost has perfect solutions to it. But I am working on a legacy system using Hpricot of Ruby on Rail. So only Hpricot-specific suggestions please. Thank you.<br>
<br>In my HTML parser,� I can parse a html file with the following hpricot commands:<br>(1) doc = open( &quot;MyFileToParse.html&quot; ) { |f| Hpricot(f) }<br>(2) elements = (doc.search(&quot;/html/body/table/tr/td/table/tr/td/font&quot;) )<br>
(3) puts (elements[13]).inner_html<br><br>to get the following output:<br><br>Giaever G, et al (2002). Functional profiling of the Saccharomyces c<br>erevisiae genome. Nature, 418:387-91. [&lt;a href=&quot;<a href="http://www.ncbi.nlm.nih.gov/entr">http://www.ncbi.nlm.nih.gov/entr</a><br>
ez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;list_uids=12140549&amp;dopt=Abstract&quot; target=&quot;_<br>blank&quot;&gt;PubMed&lt;/a&gt;]<br><br>How can I proceed to get the following results (3) and (4) respectively?<br>
(3) Giaever G, et al (2002). Functional profiling of the Saccharomyces c<br>erevisiae genome. Nature, 418:387-91.<br><br>(4) <a href="http://www.ncbi.nlm.nih.gov/pubmed/12140549?dopt=Abstract">http://www.ncbi.nlm.nih.gov/pubmed/12140549?dopt=Abstract</a><br>
<br>NOTE: to get (4) I need to take two more steps: (5) replace &quot;&amp;&quot; with &quot;?&quot; (6) replace &quot;PubMed&quot; with &quot;pubmed&quot; (this might be trivial, but how?) in the process of parsing in addition to &quot;normal&quot; HTML parsing.<br>
<br>Thanks a lot in advance.<br>Robert<br><input id="gwProxy" type="hidden"><input onclick="jsCall();" id="jsProxy" type="hidden"><div id="refHTML"></div>