Boost logo

Boost :

From: John Max Skaller (skaller_at_[hidden])
Date: 2001-06-08 16:14:39


joel de guzman wrote:

> 1. Why is compilation times bounded by lexical analysis?

        Because finding a lexeme requires scanning characters.
There are lots of them. Even a very fast lexer has some
per character overhead.

> 2. What are the performance bottle-necks typically encountered?

        Old fashioned lexers use buffers. It is very expensive
to test for end of buffer. On modern equipment, you can just
load the whole file into memory and add a 'null' at the end,
this speeds up advancing to the next character a huge amount.

        Still, you have to do a lookup into a rather large
transition table. That typically guarrantees a physical
memory access (clobbers the cache). So a 100Mhz processor
is suddenly running at 10Mhz.

        For example, the Felix lexer:

        ocamllex.opt src/flx_lex.mll
        208 states, 3174 transitions, table size 13944 bytes

Obviously, 13K is bigger than the primary processor cache.

> 3. What can be done about it?

        Often, a hand written lexer will be faster than
anything else.
 
> I am asking this because in principle (I haven't done this yet)
> Spirit can be a lexer and a parser.

        Well, just an idea .. bootstrap it.

        First, you write your lexer and/or parser using
Spirit. You don't care if it is fast, you're just prototyping.

        Now, feed the Spirit C++ encoding into a
lexer/parser generator to produce production quality
lexer/parser tables.

        The front end of the lexer/parser generator
has to lex and parse Spirit. The obvious language to
write the lexer/parser in is .. Spirit.

        An easy technique here is to generate
lex/yacc output from Spirit input. Obviously,
the user is going to have to accept a language/grammar
which is 'LR(1) & Spirit'.

-- 
John (Max) Skaller, mailto:skaller_at_[hidden]
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk