Boost logo

Boost-Build :

From: Joao Abecasis (jpabecasis_at_[hidden])
Date: 2005-09-04 07:13:32

Rene Rivera wrote:
> João Abecasis wrote:
>>Alexey Pakhunov wrote:
>>>I think the limit will not solve all problems. I think some kind of
>>>streaming support should be implemented instead. For example each time
>>>'CAT' is called it will read only a single block/line/block of lines.
> That seems like the best approach.

I have to admit it looks more in line with how bjam handles files and
strings in general.

>>I also thought of implementing a grep-like rule that'd use streaming and
>>avoid mapping entire files to memory:
>> rule GREP ( regexp : files * : recursive ? )
> The problem with a GREP solution is that it limits what one can do with
> the results. For example it would not help in the implementation of the
> current doc support as it changes what it greps for contextually. So it
> would end up doing what it currently does of "reading" in the files with
> an initial grep of "^(.*)$" and doing further greps internally.

Makes sense. This gives strength to your READ proposal, and GREP could
be built on top of it.

> What is the "recursive" argument?

It was meant to be a flag for recursing into directories.

>>>- Support of line-by-line reading.
> How about something like:
> rule READ ( file : first-and-last-line * : regexp ? )
> # file, path to file
> # first-and-last-line, range of lines to read, default ( 0 1 )
> # regexp, optional regex to apply to each line before it is returned
> #
> # returns: ( first-line-read last-line-read strings * )
> That allows for considerable flexibility in how much and how one reads
> in a file. For example reading in a line at a time:
> local r = [ READ "somefile.txt" ] ;
> while $(r[1]) < $(r[2])
> {
> ECHO Line #$(r[1]) - $(r[3-]) ;
> r = [ READ "somefile.txt"
> : [ CALC $(r[1]) + 1 ] [ CALC $(r[2]) + 1 ] ] ;
> }

Some issues regarding the implementation of a READ command.

* How are multiple reads on the same file to be handled? Is the file
repeatedly opened? Does it remain open for the duration of a bjam run?

FWIW, map_file_* could still be used as a back-end to cache file data in
memory. A possible issue with my implementation is that it naively
assumes files don't change, if they are changing they must be explicitly
unmapped or remapped.

* Are (file, line) pairs tracked? Or do we fopen/fseek on repeated
reads? What about when sequentially calling READ a line at a time?

Hmm... Then again, perhaps I'm letting implementation details cloud my
vision of the big picture ;-)




Boost-Build list run by bdawes at, david.abrahams at, gregod at, cpdaniel at, john at