From: Joao Abecasis (jpabecasis_at_[hidden])
Date: 2005-09-04 07:13:32
Rene Rivera wrote:
> JoÃ£o Abecasis wrote:
>>Alexey Pakhunov wrote:
>>>I think the limit will not solve all problems. I think some kind of
>>>streaming support should be implemented instead. For example each time
>>>'CAT' is called it will read only a single block/line/block of lines.
> That seems like the best approach.
I have to admit it looks more in line with how bjam handles files and
strings in general.
>>I also thought of implementing a grep-like rule that'd use streaming and
>>avoid mapping entire files to memory:
>> rule GREP ( regexp : files * : recursive ? )
> The problem with a GREP solution is that it limits what one can do with
> the results. For example it would not help in the implementation of the
> current doc support as it changes what it greps for contextually. So it
> would end up doing what it currently does of "reading" in the files with
> an initial grep of "^(.*)$" and doing further greps internally.
Makes sense. This gives strength to your READ proposal, and GREP could
be built on top of it.
> What is the "recursive" argument?
It was meant to be a flag for recursing into directories.
>>>- Support of line-by-line reading.
> How about something like:
> rule READ ( file : first-and-last-line * : regexp ? )
> # file, path to file
> # first-and-last-line, range of lines to read, default ( 0 1 )
> # regexp, optional regex to apply to each line before it is returned
> # returns: ( first-line-read last-line-read strings * )
> That allows for considerable flexibility in how much and how one reads
> in a file. For example reading in a line at a time:
> local r = [ READ "somefile.txt" ] ;
> while $(r) < $(r)
> ECHO Line #$(r) - $(r[3-]) ;
> r = [ READ "somefile.txt"
> : [ CALC $(r) + 1 ] [ CALC $(r) + 1 ] ] ;
Some issues regarding the implementation of a READ command.
* How are multiple reads on the same file to be handled? Is the file
repeatedly opened? Does it remain open for the duration of a bjam run?
FWIW, map_file_* could still be used as a back-end to cache file data in
memory. A possible issue with my implementation is that it naively
assumes files don't change, if they are changing they must be explicitly
unmapped or remapped.
* Are (file, line) pairs tracked? Or do we fopen/fseek on repeated
reads? What about when sequentially calling READ a line at a time?
Hmm... Then again, perhaps I'm letting implementation details cloud my
vision of the big picture ;-)
Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk