Boost logo

Boost-Commit :

Subject: [Boost-commit] svn:boost r68459 - in sandbox/tools/auto_index: doc doc/html doc/html/autoindex src
From: john_at_[hidden]
Date: 2011-01-26 13:51:50


Author: johnmaddock
Date: 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
New Revision: 68459
URL: http://svn.boost.org/trac/boost/changeset/68459

Log:
Escape terms to valid xml.
Print out the number of index entries at the end.
Update docs to match.
Text files modified:
   sandbox/tools/auto_index/doc/auto_index.qbk | 12 +++++++++---
   sandbox/tools/auto_index/doc/html/autoindex/script_ref.html | 16 ++++++++--------
   sandbox/tools/auto_index/doc/html/autoindex/tut.html | 21 ++++++++++++++-------
   sandbox/tools/auto_index/doc/html/autoindex/xml.html | 8 ++++----
   sandbox/tools/auto_index/doc/html/index.html | 4 ++--
   sandbox/tools/auto_index/src/auto_index.cpp | 1 +
   sandbox/tools/auto_index/src/file_scanning.cpp | 34 +++++++++++++++++++++++++++++++---
   7 files changed, 69 insertions(+), 27 deletions(-)

Modified: sandbox/tools/auto_index/doc/auto_index.qbk
==============================================================================
--- sandbox/tools/auto_index/doc/auto_index.qbk (original)
+++ sandbox/tools/auto_index/doc/auto_index.qbk 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -387,6 +387,12 @@
 [pre Indexing 990 terms... ]
    
 If you don't see that, or if it's indexing 0 terms then something is wrong!
+
+Likewise when index generation is complete, auto-index will emit another message:
+
+[pre 38 Index entries were created.]
+
+Again if you see that 0 entries were created then something is wrong!
    
 [h4 Step 7: Iterate]
 
@@ -621,9 +627,9 @@
 for the most part, you can assume that you're indexing plain text when writing regular expressions.
 * Named XML entities for &, ", ', < or > are converted to their corresponding characters before indexing
 a section of text. However, decimal or hex escape sequences are not currently converted.
-* Index terms are inserted into the XML sequence just as they are, and no attempt is made to
-escape them to valid XML. Normally these are C++ identifiers anyway so that's not an issue, but
-you should take care not to define scanners that create index terms containing &, ", ', < or >.
+* Index terms are assumed to be plain text (whether they originate from the script file
+or from scanning source files) and the characters &, ", < and > will be escaped to
+&amp; &quot; &lt; and &gt; respectively.
 
 [endsect]
 

Modified: sandbox/tools/auto_index/doc/html/autoindex/script_ref.html
==============================================================================
--- sandbox/tools/auto_index/doc/html/autoindex/script_ref.html (original)
+++ sandbox/tools/auto_index/doc/html/autoindex/script_ref.html 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -23,7 +23,7 @@
       The following elements can occur in a script:
     </p>
 <a name="autoindex.script_ref.comments_and_blank_lines"></a><h5>
-<a name="id981429"></a>
+<a name="id1016394"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.comments_and_blank_lines">Comments and
       blank lines</a>
     </h5>
@@ -32,7 +32,7 @@
       with a '#'.
     </p>
 <a name="autoindex.script_ref.simple_inclusions"></a><h5>
-<a name="id981446"></a>
+<a name="id1016410"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.simple_inclusions">Simple Inclusions</a>
     </h5>
 <pre class="programlisting"><span class="identifier">term</span> <span class="special">[</span><span class="identifier">regular</span><span class="special">-</span><span class="identifier">expression1</span> <span class="special">[</span><span class="identifier">regular</span><span class="special">-</span><span class="identifier">expression2</span> <span class="special">[</span><span class="identifier">category</span><span class="special">]]]</span>
@@ -99,7 +99,7 @@
 </dl>
 </div>
 <a name="autoindex.script_ref.source_file_scanning"></a><h5>
-<a name="id981655"></a>
+<a name="id1016619"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.source_file_scanning">Source File Scanning</a>
     </h5>
 <pre class="programlisting"><span class="special">!</span><span class="identifier">scan</span> <span class="identifier">source</span><span class="special">-</span><span class="identifier">file</span><span class="special">-</span><span class="identifier">name</span>
@@ -128,7 +128,7 @@
       </p></td></tr>
 </table></div>
 <a name="autoindex.script_ref.directory_and_source_file_scanning"></a><h5>
-<a name="id981733"></a>
+<a name="id1016697"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.directory_and_source_file_scanning">Directory
       and Source File Scanning</a>
     </h5>
@@ -157,7 +157,7 @@
 </dl>
 </div>
 <a name="autoindex.script_ref.excluding_terms"></a><h5>
-<a name="id981857"></a>
+<a name="id1016822"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.excluding_terms">Excluding Terms</a>
     </h5>
 <pre class="programlisting"><span class="special">!</span><span class="identifier">exclude</span> <span class="identifier">term</span><span class="special">-</span><span class="identifier">list</span>
@@ -170,7 +170,7 @@
       of things to index.
     </p>
 <a name="autoindex.script_ref.rewriting_section_names"></a><h5>
-<a name="id981913"></a>
+<a name="id1016877"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.rewriting_section_names">Rewriting Section
       Names</a>
     </h5>
@@ -217,7 +217,7 @@
       all index entries - thus preventing lots of entries under "The" etc!
     </p>
 <a name="autoindex.script_ref.defining_or_changing_the_file_scanners"></a><h5>
-<a name="id982065"></a>
+<a name="id1017030"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.defining_or_changing_the_file_scanners">Defining
       or Changing the File Scanners</a>
     </h5>
@@ -320,7 +320,7 @@
       scanner may find in the documentation.
     </p>
 <a name="autoindex.script_ref.debugging"></a><h5>
-<a name="id982568"></a>
+<a name="id1017532"></a>
       <a class="link" href="script_ref.html#autoindex.script_ref.debugging">Debugging</a>
     </h5>
 <p>

Modified: sandbox/tools/auto_index/doc/html/autoindex/tut.html
==============================================================================
--- sandbox/tools/auto_index/doc/html/autoindex/tut.html (original)
+++ sandbox/tools/auto_index/doc/html/autoindex/tut.html 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -20,7 +20,7 @@
 <a name="autoindex.tut"></a><a class="link" href="tut.html" title="Getting Started and Tutorial">Getting Started and Tutorial</a>
 </h2></div></div></div>
 <a name="autoindex.tut.step_1__build_the_tool"></a><h5>
-<a name="id975919"></a>
+<a name="id1010871"></a>
       <a class="link" href="tut.html#autoindex.tut.step_1__build_the_tool">Step 1: Build the tool</a>
     </h5>
 <p>
@@ -62,7 +62,7 @@
       is accepted into Boost.
     </p>
 <a name="autoindex.tut.step_2__configure_boost_build"></a><h5>
-<a name="id976083"></a>
+<a name="id1011036"></a>
       <a class="link" href="tut.html#autoindex.tut.step_2__configure_boost_build">Step 2: Configure
       Boost.Build</a>
     </h5>
@@ -328,7 +328,7 @@
 <span class="special">}</span>
 </pre>
 <a name="autoindex.tut.step_3__add_indexes_to_your_documentation"></a><h5>
-<a name="id976693"></a>
+<a name="id1011645"></a>
       <a class="link" href="tut.html#autoindex.tut.step_3__add_indexes_to_your_documentation">Step
       3: Add indexes to your documentation</a>
     </h5>
@@ -422,7 +422,7 @@
         &lt;xsl:param&gt;index.on.type=1
 </pre>
 <a name="autoindex.tut.step_4__create_the_script_file"></a><h5>
-<a name="id976995"></a>
+<a name="id1011948"></a>
       <a class="link" href="tut.html#autoindex.tut.step_4__create_the_script_file">Step 4: Create
       the script file</a>
     </h5>
@@ -495,7 +495,7 @@
 <pre class="programlisting"><span class="special">!</span><span class="identifier">rewrite</span><span class="special">-</span><span class="identifier">name</span> <span class="string">"(?i)(?:A|The)\s+(.*)"</span> <span class="string">"\1"</span>
 </pre>
 <a name="autoindex.tut.step_5__add_manual_index_entries___optional"></a><h5>
-<a name="id977249"></a>
+<a name="id1012202"></a>
       <a class="link" href="tut.html#autoindex.tut.step_5__add_manual_index_entries___optional">Step
       5: Add Manual Index Entries - Optional</a>
     </h5>
@@ -511,7 +511,7 @@
       index itself, with the exception of the "type" attribute.
     </p>
 <a name="autoindex.tut.step_6__build_the_docs"></a><h5>
-<a name="id981239"></a>
+<a name="id1016192"></a>
       <a class="link" href="tut.html#autoindex.tut.step_6__build_the_docs">Step 6: Build the Docs</a>
     </h5>
 <p>
@@ -536,8 +536,15 @@
 <p>
       If you don't see that, or if it's indexing 0 terms then something is wrong!
     </p>
+<p>
+ Likewise when index generation is complete, auto-index will emit another message:
+ </p>
+<pre class="programlisting">38 Index entries were created.</pre>
+<p>
+ Again if you see that 0 entries were created then something is wrong!
+ </p>
 <a name="autoindex.tut.step_7__iterate"></a><h5>
-<a name="id981350"></a>
+<a name="id1016314"></a>
       <a class="link" href="tut.html#autoindex.tut.step_7__iterate">Step 7: Iterate</a>
     </h5>
 <p>

Modified: sandbox/tools/auto_index/doc/html/autoindex/xml.html
==============================================================================
--- sandbox/tools/auto_index/doc/html/autoindex/xml.html (original)
+++ sandbox/tools/auto_index/doc/html/autoindex/xml.html 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -35,10 +35,10 @@
           decimal or hex escape sequences are not currently converted.
         </li>
 <li>
- Index terms are inserted into the XML sequence just as they are, and no
- attempt is made to escape them to valid XML. Normally these are C++ identifiers
- anyway so that's not an issue, but you should take care not to define scanners
- that create index terms containing &amp;, ", ', &lt; or &gt;.
+ Index terms are assumed to be plain text (whether they originate from the
+ script file or from scanning source files) and the characters &amp;, ",
+ &lt; and &gt; will be escaped to &amp;amp; &amp;quot; &amp;lt; and &amp;gt;
+ respectively.
         </li>
 </ul></div>
 </div>

Modified: sandbox/tools/auto_index/doc/html/index.html
==============================================================================
--- sandbox/tools/auto_index/doc/html/index.html (original)
+++ sandbox/tools/auto_index/doc/html/index.html 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -21,7 +21,7 @@
 </h3></div></div></div>
 <div><p class="copyright">Copyright &#169; 2008 John Maddock</p></div>
 <div><div class="legalnotice">
-<a name="id975631"></a><p>
+<a name="id1010583"></a><p>
         Distributed under the Boost Software License, Version 1.0. (See accompanying
         file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
       </p>
@@ -41,7 +41,7 @@
 </div>
 </div>
 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
-<td align="left"><p><small>Last revised: January 26, 2011 at 18:09:33 GMT</small></p></td>
+<td align="left"><p><small>Last revised: January 26, 2011 at 18:45:43 GMT</small></p></td>
 <td align="right"><div class="copyright-footer"></div></td>
 </tr></table>
 <hr>

Modified: sandbox/tools/auto_index/src/auto_index.cpp
==============================================================================
--- sandbox/tools/auto_index/src/auto_index.cpp (original)
+++ sandbox/tools/auto_index/src/auto_index.cpp 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -623,6 +623,7 @@
    std::ofstream os(outfile.c_str());
    os << header << std::endl;
    boost::tiny_xml::write(*xml, os);
+ std::cout << index_entries.size() << " Index entries were created." << std::endl;
 
    }
    catch(boost::exception& e)

Modified: sandbox/tools/auto_index/src/file_scanning.cpp
==============================================================================
--- sandbox/tools/auto_index/src/file_scanning.cpp (original)
+++ sandbox/tools/auto_index/src/file_scanning.cpp 2011-01-26 13:51:49 EST (Wed, 26 Jan 2011)
@@ -107,6 +107,34 @@
    }
 }
 //
+// Helper to convert string from external source into valid XML:
+//
+std::string escape_to_xml(const std::string& in)
+{
+ std::string result;
+ for(std::string::size_type i = 0; i < in.size(); ++i)
+ {
+ switch(in[i])
+ {
+ case '&':
+ result.append("&amp;");
+ break;
+ case '<':
+ result.append("&lt;");
+ break;
+ case '>':
+ result.append("&gt;");
+ break;
+ case '"':
+ result.append("&quot;");
+ break;
+ default:
+ result.append(1, in[i]);
+ }
+ }
+ return result;
+}
+//
 // Scan a source file for things to index:
 //
 void scan_file(const char* file)
@@ -153,7 +181,7 @@
          try
          {
             index_info info;
- info.term = i->format(pscan->term_formatter);
+ info.term = escape_to_xml(i->format(pscan->term_formatter));
             info.search_text = i->format(pscan->format_string);
             info.category = pscan->type;
             if(!pscan->section_filter.empty())
@@ -387,7 +415,7 @@
          while(i != j)
          {
             index_info info;
- info.term = unquote(*i);
+ info.term = escape_to_xml(unquote(*i));
             // Erase all entries that have a category in our scanner set,
             // plus any entry with no category at all:
             index_terms.erase(info);
@@ -412,7 +440,7 @@
             // in order for the term to be indexed (optional)
             // what[4] is the index category to place the term in (optional).
             index_info info;
- info.term = unquote(what.str(1));
+ info.term = escape_to_xml(unquote(what.str(1)));
             std::string s = unquote(what.str(2));
             if(s.size())
                info.search_text = boost::regex(s, boost::regex::icase|boost::regex::perl);


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk