Boost logo

Boost :

From: Andrei Alexandrescu (andrewalex_at_[hidden])
Date: 2002-10-23 18:56:31


> Before you complain about how "bloated" libraries like regex++ and GRETA
are
> compared to libs like PCRE, it's important to consider differences in the
> feature sets. GRETA and regex++ are *generic* template libraries that
work
> with any bidirectional iterator type, with narrow or wide characters.
PCRE
> (as far as I know) only works on char*.

This argument does not hold. We're not in the dark ages of the OO craze when
everything inherited something with the code bloat and the vtables and
everything. I'm so glad those days are behind us, but these days are bad as
well, just for different reasons! :o) Too many emperors walking down the
streets naked, and too many people do their best to act as if that's oh so
cool. (That doesn't relate to the topic by the way.)

If the regular expressions support (via templates) any iterators and any
character type, yet I use them only with char* and char, then I should get
about as much mileage as from a library designed for char* and char. Is that
a reasonable assumption?

> Also, regex++ has an extremely rich
> feature set, and GRETA is getting there. regex++ can work with C++
locales.
> PCRE can't do that. You buy a lot of functionality with all that "bloat".

No; I buy a lot of bloat with that functionality ***I might not need***.

> That said, I think GRETA is about as big as regex++.

The hundreds of KB still put both libraries in the hugely expensive range.
At 500 KB, it would frankly border on unacceptability for many real
projects.

Most of the time, I (and I believe other people) need to do the simplest
parsing tasks, such as date/time, numbers, stuff. Nothing out of the
ordinary. To get that done, you need to link in a huge library which offers
you all that esoteric stuff you might not need, or just do your own crappy
little stuff. I end up doing the latter. Not good.

Any chance we get rid of this obese code bloat?

For example, I'm thinking of a layered design that allows the user to choose
what level of functionality they need by specifying policies.

The barebones library regexp<char> offers grouping, ranges, and Kleene
closure. Then you may want to support negated regular expressions (something
that I recall is not trivial to do) so you ask for regexp<char, negation>.
Then you want negation AND infinite lookahead, so you say: regexp<char,
negation, inf_lookahead>. Or maybe you don't care for negated languages but
you do want infinite lookahead, so you say regexp<char, no_negation,
inf_lookahead>. And so on.

If a pattern is seen that there's no support for, regexp can throw an
exception.

Would this be workable?

And by the way, how's Spirit doing in that respect? If I can do easily with
Spirit what I can do with regex, and if it compiles in only what I actually
use, then I'd gladly use Spirit. I know the power is there but only when I
need it!

Another approach to an alternative regex library would be similar to Spirit:
use expression templates and C++ operators to emulate regex syntax. I did
participate to developing such a lib at a corporate job, but didn't have
time to take it too far. The main limitation is that you can't specify
patterns at runtime, which most re users don't need anyway.

That way indeed you pay as you go, because only the used operators will be
instantiated.

Andrei

--
All new!  THE C++ Seminar: Oct. 28-30 in Vancouver, WA.
http://www.thecppseminar.com/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk