Boost logo

Boost :

From: David Abrahams (abrahams_at_[hidden])
Date: 2001-06-09 07:00:27


----- Original Message -----
From: "joel de guzman" <isis-tech_at_[hidden]>

>
> Sorry if I wasn't clear. What I meant was that a lexer basically
> gobbles the input stream in a linear fashion, working on the character
> level and extracting tokens t1..tn. Every step of the way, the lexer
> scans the current input and decides what token [t1..tn] it is.

maybe you mean "decides what token [c1..cn] is?"
A lexer has state, so it doesn't need to scan the current input every step
of the way.

> It then outputs a linear stream of tokens. Thus I see it as having
> the form: t1 | t2 | ..... tn, where t1..n are the rules for the all
> the expected tokens.

Not too different from the syntax rules for a statement in C++.

> The way I see it is that the lexer sees a linear stream of
> characters (tokens in the point of view of the lexer) and
> converts this to another linear stream of tokens (tokens
> in the point of view of the parser). The bottleneck I see
> is that the lexer cannot do any reasonable prediction as
> to what the next token (parser point of view) is because
> it does not know anything about the grammar.

How much can a C++ parser predict after it sees a semicolon?
That's not the bottleneck; as I said before the bottleneck is simply that
the lexer has more items to process than the parser does.

> For
> instance it doesn't know that an identifier should follow
> 'class'.

That generally doesn't make anything faster (except in a very greedy world),
because you can generally delay a decision about what you've actually seen
until you've seen it.

-Dave


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk