Boost logo

Boost :

Subject: Re: [boost] [regex, xpressive] interesting(?) perf benchmark
From: John Bytheway (jbytheway+boost_at_[hidden])
Date: 2010-06-08 16:28:24


On 08/06/10 14:13, Eric Niebler wrote:
> On 6/8/2010 4:22 AM, John Bytheway wrote:
>> I once wrote a compile-time lexer compiler, which seems to be the sort
>> of thing you're suggesting. It worked like this:
>>
>> typedef lexer<
>> rule<literal<'k','e','y','w','o','r','d'>, construct<KEYWORD> >,
>
> mpl::string could help you here.

Indeed. I think it didn't exist when I wrote this code.

>> The library will take this lexer specification and turn it into an NFA,
>> then transform that into a DFA, then encode that as a transition table
>> in an array, *all at compile time*.
>
> Wow! This is something like what I had in mind.

Well, I'm glad you seem impressed. It wasn't trivial to make this work :).

> <snip>
>> The down side with this approach, of course, is the absolutely massive
>> compile-time cost. Compiling the above example with g++ 4.4.3 takes
>> over 6 minutes and over 1.3GiB of memory (and this is after I've put in
>> effort to optimize it). The time I could live with, but the memory
>> requirements are absurd. Any lexer of real-world complexity would be
>> quite impossible to compile.
>
> Ouch, that's a serious problem. Maybe there is a more efficient way to
> get to a DFA. Or maybe the world isn't ready for compile-time DFAs.

I suspect there's a decent constant factor that could be squeezed out,
but I don't think there are any serious algorithmic improvements. OTOH,
metaprogramming algorithmic complexity is confusing at the best of
times, so I may be wrong. Once g++ supports C++0x constexpr functions I
intend to revisit this; they might well provide substantial improvement.
 That said, I'll try clang now and see how that fares...

>> However, it might just be possible to use it for simple regexes, rather
>> than full-blown lexers. Anyone think that might be worth investigating?
>
> If the compile times are that bad, I wouldn't be able to use it. I can't
> use it as-is regardless, because static xpressive doesn't have
> compile-time access to the string literals; they're still char*.

Yeah, that is an annoyance. I don't think there's anything to be done
about it, though.

John Bytheway


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk