Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2005-09-12 17:34:00

Thomas Witt wrote:
> What is the rationale for having the regex type bound to the iterator
> type used when matching/searching ?
> Thomas

Good question. From the regex proposal, John writes:

"There is one other restriction worth pointing out; the Boost interface
[parameterized on Char instead of Iterator] does not easily allow an
implementation to generate self-modifying machine code for the state
machine. That would require the state machine to be fixed to a finite
number of iterator types, and in practice there appear to be no regex
implementations that make this design choice in any case."

xpressive is such an implementation. At least, static xpressive is.
Since the the full regex expression *and* the iterator type are known at
compile time, the compiler is used to generate machine code to match a
regex to a data sequence. Each static regex generates its own small
custom pattern matching engine. For these regexes, there are no
op-codes, jump tables, virtual calls or other indirections.

If the regex type were parameterized on Char instead of Iterator, then
the compiled form of a regex would have to be Iterator-neutral. You
would have to separate the internal representation of the regex from the
code that evaluates the regex against a data sequence. This requires an
indirection somewhere -- in the case of Boost.Regex, it's op codes and a
jump table. (John can correct me if I'm wrong here.) Arrays of function
pointers flummox optimizers.

In short, the answer is performance. In order to take full advantage of
the compile-time information of static regexes, the regex type must be
parameterized on Iterator, not Char. In case you're wondering if it
matters or not, static regex are ~13% faster than their dynamic
equivalents on average (on VC7.1). Perf results here:


Eric Niebler
Boost Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at