Boost logo

Boost :

From: joel de guzman (isis-tech_at_[hidden])
Date: 2001-06-08 23:36:56


----- Original Message -----
From: "David Abrahams":

> > So lexers are basically of the form: t1 | t2 | ..... tn
> > in a loop while skipping white spaces?
>
> I don't understand what you wrote, which leads me to suspect that you
didn't
> understand what I wrote. A token, to a lexer, is a character. A token, to
a
> parser, is often made up of many characters. Usually, the lexer needs to
> process tokens that are not even a part of any parser token (whitespace,
> comments). Ipso facto, the lexer must process many more tokens than the
> parser.
>
> -Dave
>

Sorry if I wasn't clear. What I meant was that a lexer basically
gobbles the input stream in a linear fashion, working on the character
level and extracting tokens t1..tn. Every step of the way, the lexer
scans the current input and decides what token [t1..tn] it is.
It then outputs a linear stream of tokens. Thus I see it as having
the form: t1 | t2 | ..... tn, where t1..n are the rules for the all
the expected tokens.

The way I see it is that the lexer sees a linear stream of
characters (tokens in the point of view of the lexer) and
converts this to another linear stream of tokens (tokens
in the point of view of the parser). The bottleneck I see
is that the lexer cannot do any reasonable prediction as
to what the next token (parser point of view) is because
it does not know anything about the grammar. For
instance it doesn't know that an identifier should follow
'class'.

-Joel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk