|
Boost : |
From: Eric Niebler (eric.niebler_at_[hidden])
Date: 2006-11-22 15:38:35
Hartmut Kaiser wrote:
>
> David Abrahams wrote:
>
>>> Yes, and Slex is the other one
>> Not to mention XPressive?
>
> Xpressive is not really usable as a lexer, and Eric is aware of that. I have
> a Wave lexer implemented with Xpressive here on my hard disk, and it
> functions well, it is only 3 magnitudes slower as for instance the re2c
> based one. The main reasons are:
>
> - no optimization between different regex's used for token representation
> (no internal NFA/DFA generation)
> - no way to tell which alternative matched if using regex's containing
> alternatives
>
> The first rules out using separate regex's, one for each token, the second
> one inhibits us from using one giant regex with alternatives...
>
> Both are probably merely natural restrictions stemmed from the fact
> Xpressive is a regex library not a lexer generator. The same issues would
> probably occur if we were trying to use Boost.Regex for this task.
Ah, yes. I remember now. And I was going to implement a special matcher
that was a trie for token literals, to reduce the need for so many
alternates.
Could you send me your code to integrate xpressive and Wave. It seems
unlikely that I'd be able to do better than re2c with xpressive, but it
might be interesting nonetheless.
What would be nice is a DSEL that generates optimal DFA-based lexers.
But given the sheer number of DFA states some lexers generate, I wonder
if an expression template approach is even viable.
-- Eric Niebler Boost Consulting www.boost-consulting.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk