Boost logo

Boost :

From: Eric Niebler (neric_at_[hidden])
Date: 2002-10-23 20:45:46


> If the regular expressions support (via templates) any iterators and any
> character type, yet I use them only with char* and char, then I should get
> about as much mileage as from a library designed for char* and char. Is
that
> a reasonable assumption?

The OP is refering to compiling regex++ into a separate shared lib. If you
want a lib, you have to pay for every feature you could possibly use.
Certainly if you're willing to go with the inclusion model, then you can do
much better. But the inclusion model is expensive because regex
implementations are non-trivial. It takes gcc a full 5 minutes (on a dual
PIII 550) to compile GRETA's implementation at /O3. Putting all that in a
header is just not an option.

> The hundreds of KB still put both libraries in the hugely expensive range.
> At 500 KB, it would frankly border on unacceptability for many real
> projects.

That's for a "kitchen sink" configuration. I just compiled a little regex
exe on my machine that uses GRETA with just 1 instantiation, and it weighs
in at 150 Kb. (There's are some things I can do to cut that in half, but
there are only so many hours in a day. :-) Likewise with regex++, you can
put the implementation files into your project, or use a statically linked
lib, and have the linker throw out dead code. You don't always have to pay
for everything.

> The barebones library regexp<char> offers grouping, ranges, and Kleene
> closure. Then you may want to support negated regular expressions
(something
> that I recall is not trivial to do) so you ask for regexp<char, negation>.
> Then you want negation AND infinite lookahead, so you say: regexp<char,
> negation, inf_lookahead>. Or maybe you don't care for negated languages
but
> you do want infinite lookahead, so you say regexp<char, no_negation,
> inf_lookahead>. And so on.

Definitely possible, if you're willing to write and maintain many different
regex engines and all the interactions between the different policies. And
you could intelligently partition your headers files so that you only
included the code you needed. But if you need to do something non-trivial,
you would be including 1000's of lines of template code. It's a trade-off.
GRETA has a (less than?) half-way solution that puts the implementation in a
separate file and lets you control instantiation with a macro. Maybe export
would help here.

> Another approach to an alternative regex library would be similar to
Spirit:
> use expression templates and C++ operators to emulate regex syntax. ...
> The main limitation is that you can't specify
> patterns at runtime, which most re users don't need anyway.

IMO, this is critical functionality. Without it, you couldn't use a regex
engine to write a grep-like utility, or a text editor with regex search, or
a log parsing tool. These are all uses people have found for GRETA, and
regex++ I'm sure.

There is usablity to consider, also. Specifying a regex as a string is much
easier and more terse than writing an expression template, particularly if
you have to chose policies to "turn on" certain feaures of your regex engine
and know which headers you need to include to get those features.

What would be ideal is to have an engine that came with two parsers -- one
runtime string parser, and one compile-time expression-template parser
(assuming you didnt have to pay for one if you didn't use it). This is a
direction I have been planning on taking GRETA, but I've only just begun to
play with the code. It's a tricky problem, but I'm convinced it's doable,
and worth the effort.

Eric


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk