[Boost-docs] Trac syntax highlighting for QuickBook, first attempt

Subject: [Boost-docs] Trac syntax highlighting for QuickBook, first attempt
From: Kai Brüning (kai_at_[hidden])
Date: 2007-08-16 15:17:52


the included files show my results so far towards Trac syntax
highlighting for QuickBook using Pygments:

* tutorial.html: result of highlighting tutorial.qbk of the Boost
Python Library.

* __init__.py: the Pygments lexer used to create the highlighting


* I do not like the styles too much. Tried to copy some color from
the Matias' Kate highlighting
(tools/quickbook/extra/katepart/syntax/boost_hs_quickbook.xml) while
leaving the default styles of Pygments for Trac for C++ and Python
alone. In any case, selecting the styles is independent of lexing.

* The following token types are detected so far:
- Operator.Markup ([, ], `, ``, ''', *, /, =, _, #)
- Keyword.Markup (known markup keywords like 'section', 'c++' etc.)
- Comment.Multiline
- Code (passed to appropriate code lexers)
- Text (everything else)

* Source mode is correctly detected and the existing C++ and Python
lexers of Pygments are used accordingly for the code.

* I tried to avoid to actually parse the source, that is I do not
follow the nesting of markup in most cases. Exception is currently
the doc info part, because in this part the allowed markup keywords
differ from the rest of the document.

As a result,
- [pre .. ] does not work correctly so far.
- It is not possible to highlight (some) marked text differently,
like specifically styled text or table headers.

* The reasons not to count the nesting are twofold:
- The existing Pygments lexers are mostly kept very simple, having a
lot less states than I created so far. I'd like to stick with this
style as much as possible.
- It would probably be hard (even impossible with regexes?) to
correctly track escaped markup characters. Currently I handle the
simple case (like \[) in most cases, but not multiple escape
characters (like \\[ -> markup, \\\[ -> no markup). Although unlikely
in practice, I do not like the idea too much to do nesting tracking
which can easily be broken.


* Am I moving in the right direction?

* What kind of tokens should be detected (additionally)? Should any
of the non-structure text (besides code) be highlighted specifically?

* How do you thing about the nesting issue? The Kate highlighting
tracks nesting (as far as I can see from sources).


This archive was generated by hypermail 2.1.7 : 2017-11-11 08:50:40 UTC