|
Boost : |
From: Daniel Berlin (dan_at_[hidden])
Date: 2001-06-09 20:04:34
John Max Skaller <skaller_at_[hidden]> writes:
> Daniel Berlin wrote:
>
>> > For example, the Felix lexer:
>> >
>> > ocamllex.opt src/flx_lex.mll
>> > 208 states, 3174 transitions, table size 13944 bytes
>> >
>> > Obviously, 13K is bigger than the primary processor cache.
>>
>> Try a scanner generator that generates directly executable
>> code, like re2c.
>
> Impossible for me. Note I'm using ocamllex.
> It generates _ocaml_ code, not C.
Yeah, but it appears it's still table driven, no?
If you hand code it in ocaml, is it smaller?
Making re2c output ocaml code wouldn't be all *that* difficult.
>
>> >> 3. What can be done about it?
>> >
>> > Often, a hand written lexer will be faster than
>> > anything else.
>> Same answer.
>> Just because flex is slow, doesn't mean everything else is.
>
> Same answer. I'm not using flex.
>
>> You can make scanner generators that generate directly executable
>> scanners (IE output the real C code, rather than tables and an
>> interpreter),
>
> I doubt it. A table driven lexer can process
> text with one random and one sequential memory access.
Errr, except compilers can't optimize this well at all, not that it
would help, with the two extra memory accesses required.
> Everything else can be done in registers. With any luck,
> the whole of the code can fit into the instruction cache,
> it only takes half a dozen machine instructions
> (until you hit the semantic actions :-)
Except you forgot the memory load and store latency from reading this
table, which is so much more than the instructions in the engine it's
not funny.
Why exactly do you think hand coded lexers are often faster?
Because they aren't table driven, and avoid the extra memory accesses
inside what should be a tight loop.
>
> As David Abrahams said, this is the bottom line:
> there are a lots of characters.
Sure. And having to do 3 memory accesses for each one instead of 1
memory access (just the next character) kills you when you do it a
million times.
Just on the GCC list, for instance, it was noticed that the stepanov
benchmark numbers for powerpc were *way* higher (lower is better) than
the numbers for 2.95.
The cause?
A single dead store in the inner loop not getting cleaned up.
One extra instruction, storing 4 bytes to memory.
>
>> unless you know something you can't tell the scanner generator that
>> helps you.
>>
>> This is rare.
>
> A lot of the lexing I've done
> involved weird things like recognizing nested comments,
> indentation (python), and things like processing string literals,
> warning about leading 0's being decimal not octal,
> processing include files ... my lexers are typically much more
> complicated than my parsers :-)
I'm used to writing things like C++ parsers.
Yes, C++ parsers.
Shoot me now.
Or rather, shoot Bjarne.
So my lexers and my parsers are hopelessly intertwined.
--Dan
-- "This is my impression of a bowling ball... (Drags the mike along the floor, then lifts it...) Gutter... "-Steven Wright
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk