Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2007-09-02 14:26:21


on Sat Sep 01 2007, Chris Lattner <clattner-AT-apple.com> wrote:

>>> for example, and efficient buffer management (at least in our
>>> context) means that the input to the lexer isn't useful as an
>>> iterator interface.
>>
>> Well, the kind of input sequence is exactly one thing I would
>> templatize.
>
> To what benefit?

So people don't have to pay the price of copying their sequence into a
null-terminated memory buffer.

> In practice, clang requires its input to come from a nul terminated
> memory buffer (yes, we do correctly handle embedded nul's in the
> input buffer as whitespace). Here are the pros and cons:
>
> Pros: clang is designed for what we perceive to be the common case.
> In particular, mmap'ing in files almost always implicitly null
> terminates the buffer (if a file is not an even multiple of a page
> size, most major OS's null fill to the end of the page) so we get
> this invariant for free in most cases. Memory buffers and many
> others are also easy to handle in this scheme.
>
> Futher, knowing that we have a sequential memory buffer as an input
> makes various optimizations really trivial: for example our block
> comment skipper is vectorized on hosts that support SSE or Altivec.
> Having the nul terminator at the end of the file means that the lexer
> doesn't have to check for "end of buffer" condition in *many* highly
> performance sensitive lexing loops (e.g. lexing identifiers, which
> cannot have a nul in them).

The ability to provide specialized algorithm implementations that take
advantage of special knowledge of the data structure is a strength of
generic programming.

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk