|
Boost : |
From: John Max Skaller (skaller_at_[hidden])
Date: 2001-06-10 09:23:16
Daniel Berlin wrote:
> > It generates _ocaml_ code, not C.
> Yeah, but it appears it's still table driven, no?
Yes, it is table driven.
> If you hand code it in ocaml, is it smaller?
Haven't tried.
> Making re2c output ocaml code wouldn't be all *that* difficult.
Agree.
> > I doubt it. A table driven lexer can process
> > text with one random and one sequential memory access.
>
> Errr, except compilers can't optimize this well at all,
You're probably right :-)
> Except you forgot the memory load and store latency from reading this
> table, which is so much more than the instructions in the engine it's
> not funny.
>
There is no storing. There is a single table read,
which is effectively random, and reading the character,
which should be in the cache. So the performance bound
for the fastest possible lexer is a single random memory access.
> Why exactly do you think hand coded lexers are often faster?
Because often, you can elide the table lookup,
and do _everything_ in registers/using constants in the code.
For example, to lex an identifier, you can use range comparisons for
a-z, A-Z, 0-9
> Because they aren't table driven, and avoid the extra memory accesses
> inside what should be a tight loop.
Correct. However, while it is possible to do this
by hand, most mechanically generated lexers building code
would be unlikely to spot this kind of 'smart' encoding.
> > As David Abrahams said, this is the bottom line:
> > there are a lots of characters.
>
> Sure. And having to do 3 memory accesses for each one instead of 1
> memory access (just the next character) kills you when you do it a
> million times.
I think this is the wrong way to count:
I see one random read, and one sequential one.
The random read dominates, since it is likely to
be a cache miss every time.
For an 'executable' lexer, it is possible to
reduce this time to one sequential read, which is almost
certainly going to be a cache hit: in this case,
such a lexer would fly, since it could use the cache
very effectively.
So I don't disagree with you, I just have doubts :-)
-- John (Max) Skaller, mailto:skaller_at_[hidden] 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk