Boost logo

Boost :

From: John Max Skaller (skaller_at_[hidden])
Date: 2001-06-09 17:38:53


Daniel Berlin wrote:

> > For example, the Felix lexer:
> >
> > ocamllex.opt src/flx_lex.mll
> > 208 states, 3174 transitions, table size 13944 bytes
> >
> > Obviously, 13K is bigger than the primary processor cache.
>
> Try a scanner generator that generates directly executable
> code, like re2c.

        Impossible for me. Note I'm using ocamllex.
It generates _ocaml_ code, not C.

> >> 3. What can be done about it?
> >
> > Often, a hand written lexer will be faster than
> > anything else.
> Same answer.
> Just because flex is slow, doesn't mean everything else is.

        Same answer. I'm not using flex.
 
> You can make scanner generators that generate directly executable
> scanners (IE output the real C code, rather than tables and an
> interpreter),

        I doubt it. A table driven lexer can process
text with one random and one sequential memory access.
Everything else can be done in registers. With any luck,
the whole of the code can fit into the instruction cache,
it only takes half a dozen machine instructions
(until you hit the semantic actions :-)

        As David Abrahams said, this is the bottom line:
there are a lots of characters.

> unless you know something you can't tell the scanner generator that
> helps you.
>
> This is rare.

        A lot of the lexing I've done
involved weird things like recognizing nested comments,
indentation (python), and things like processing string literals,
warning about leading 0's being decimal not octal,
processing include files ... my lexers are typically much more
complicated than my parsers :-)

-- 
John (Max) Skaller, mailto:skaller_at_[hidden]
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk