From: Daniel Berlin (dan_at_[hidden])
Date: 2001-06-08 22:45:58
"joel de guzman" <isis-tech_at_[hidden]> writes:
>> > Some questions:
>> > 1. Why is compilation times bounded by lexical analysis?
>> Sheer number of tokens to be processed. That is why most compilers can't
>> afford to use table-generated lexers and end up using hand-crafted code
>> Well, at least until template instantiation came along as a compilation
>> this was true ;-)
> So lexers are basically of the form: t1 | t2 | ..... tn
> in a loop while skipping white spaces?
No, lexers are generally just as powerful as parsers.
Well, then again, it depends on what you define a lexer as.
I've actually seen a C++ grammar implemented in FLEX (using the state
system it provides).
A lexer, in reality, operates on characters, and tokenizes them. A
parser operates on tokens.
Most people will say a lexer is done with regular expressions, or
whatever, but this isn't strictly true. ANTLR's lexers are LL(k), for
The real difference is what they operate on, and what they produce.
> Indeed that is slow;
> a lexer cannot predict what will come next from the
> input stream without knowledge of the grammar.
Nothing can predict what will come next in the input stream. It can
only say what is *supposed* to come next.
I know what you meant though, but this is only because of lex's
implementation of lexers being table based.
Try re2c, which will generate direct C code for your regular
expressions and whatnot.
It'll blow flex out of the water. I'd be *very* surprised if you
could beat re2c with hand coding.
> It is basically understood that lexers speed up parsing.
> Shouldn't we do some rethinking now?
He's actually wrong. Due to precompiled headers (most of the tokens
are repeated or unused. So you get a huge win from not processing
them.), lexical analysis
isn't the bounding factor of compilation time anymore, optimization
is. Except for compilers like Pro64, where SSA is used as basically
*the* intermediate form, making most of the optimization algorithms
linear. They still probably dominate timewise.
However, except in the case of small files (where parsing actually
dominates), gcc is optimization time bounded, particularly bounded by
-- "I filled out an application that said, "In Case Of Emergency Notify". I wrote "Doctor"... What's my mother going to do? "-Steven Wright
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk