From: Scott Woods (scottw_at_[hidden])
Date: 2005-02-07 22:17:55
----- Original Message -----
From: "Jonathan Turkanis" <technews_at_[hidden]>
Sent: Tuesday, February 08, 2005 11:05 AM
Subject: [boost] Re: Re: A library: out-of-core containers and
algorithms(ajob for boost::iostreams)
> > The current proposal to require native file access methods I think is
> > too limiting. I would propose instead writing a new version of fstream
> > which operates on vectors of files. This can be done relatively easily
> > using the boost::iostreams library by Jonathan Turkanis, I have posted
> > very preliminary prototype code (i.e. untested, naive, non-portable) at
> > http://www.cdiggins.com/big_file.hpp just to give a glimpse of one
> > possible approach. The code is modeled after the
> > concept (
> This is an intersting idea; I'd like to be convinced that it's necessary
> implementing it.
> I think there will be some tricky issues similar to the ones encountered
> implementing temp files. Specifically, you can't assume that the names
> file.2, ... will always be available; when you need a new file, you have
> for a name which is not used and create the file atomically. Also, the
> convention should be customizable.
> > If Jonathan heeds our call, he could probably finish what I started
> > in less than an hour. ;-)
I have been working on something similar for a while. Maybe some
experiences along the way are relevant (helpful?).
The functional requirements were in the area of network logging. The ability
to speedily collect and randomly access huge amounts of data were
Huge files were a detail issue, i..e. how do you store and access over 2Gb
a normal OS file? Over 4Gb? More?
Huge solitary files have a reputation for unexpectedly bad performance. In
testing I have found that huge files are bad. But understanding the true
of that trivial sample is time-consuming and thats before you consider all
Pretty early on I moved to a striping strategy, i.e. a single virtual
storage file comprising
of a sequence of OS files. I also went as far as file hierarchies (i.e.
folders of files) as eventually folders have a reputation for performance
beyond certain numbers of entries.
(NB: striping has turned out to be rather convenient. It was quite simple to
go on to develop a "sliding" version - FIFO)
Having sorted out the mass storage issue I still had to deal with the "huge
integers" thing. I suspect I have done a sidestep that may eventually turn
byte me <ahem> but so far local requirements have been fulfilled even with
the following limitations;
* there is no knowledge of bytes consumed, instead I only remember Mb
* the only "addressing" is by ordinal, e.g. log[ 68755 ] so my maximum
space is a function of 32-bit integers (the ordinal) and the bytes consumed
each log entry.
I have a GUI accessing millions of logging entries over Gbs of data
and getting constant performance.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk