Boost logo

Boost-Build :

From: Alexey Pakhunov (alexeypa_at_[hidden])
Date: 2005-09-03 11:47:09


João Abecasis wrote:

> Maybe such a rule would better fit a MMAP_FILE builtin:
> rule MMAP_FILE ( file : offset = 0 ? : bytes = 1 MB ? )

'CAT' is simply shorter. :-)

> , instead of a CAT with a file size limit. The file size limit should be
> looked at as a protection mechanism, nothing more.

It should be 'block size limit' first and only then 'file size limit'
(if any). But you are right, some kind of protection required. Result of
'malloc' is not checked through bjam sources. Lack of memory will lead
to a catastrophic failure.

> Large file support is another problem altogether -- AFAIK stdio in C has
> no support for large files (>2GB) out-of-the-box. Adding something like
> this to bjam requires a portability/abstraction layer on top of
> platform-specific implementations (then again, it might just be POSIX +
> Windows...). I'd rather to stay off those grounds for now.

OK.

> Hmm... I don't think that's how it works. IIUC, with stdio I get to see
> up to the first 2GB (?) of a file. So the filesize I can determine with
> fseek + ftell is never greater than that. (Note: there could be other
> issues from my use of long -> size_t conversions, which I believe I have
> fixed in my local copy).

I guess you are right.

> Sure, right now my quest is CAT and possibly GREP. I'm not against
> adding large file support. I'd be willing to use a portability layer
> someone else contributes ;-)
> Again, I think this is independent of CAT.

OK.

> I also thought of implementing a grep-like rule that'd use streaming and
> avoid mapping entire files to memory:
>
> rule GREP ( regexp : files * : recursive ? )

Yeah. It can be even more useful than 'CAT'. I'd change the prototype to
the following:

rule GREP ( regexp * : files * : options * )

Using several regexp's is really handy.

> - querying the size of a file (so we can decide wether and how to CAT).
> (it'd be worthwhile to return "this is a large file" ;-)

Agree.

> I think, line-by-line reading is ideal for something like a grep command
> where you can inspect the lines and discard them afterwards. But to put
> stuff up in memory I thought it'd be better to have it all in one place.

Agree.

Best regards/Venlig hilsen,
Alexey Pakhunov.

 


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk