Boost logo

Boost :

From: Scott Woods (scottw_at_[hidden])
Date: 2005-02-07 22:17:55


Hi Roman,

----- Original Message -----
From: "Jonathan Turkanis" <technews_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Tuesday, February 08, 2005 11:05 AM
Subject: [boost] Re: Re: A library: out-of-core containers and
algorithms(ajob for boost::iostreams)

<snip>

> > The current proposal to require native file access methods I think is
> > too limiting. I would propose instead writing a new version of fstream
> > which operates on vectors of files. This can be done relatively easily
by
> > using the boost::iostreams library by Jonathan Turkanis, I have posted
some
> > very preliminary prototype code (i.e. untested, naive, non-portable) at
> > http://www.cdiggins.com/big_file.hpp just to give a glimpse of one
> > possible approach. The code is modeled after the
boost::iostreams::seekable
> > concept (
>
http://home.comcast.net/~jturkanis/iostreams/libs/iostreams/doc/concepts/see
kable_device.html
> )
>
> This is an intersting idea; I'd like to be convinced that it's necessary
before
> implementing it.
>
> I think there will be some tricky issues similar to the ones encountered
when
> implementing temp files. Specifically, you can't assume that the names
file.1,
> file.2, ... will always be available; when you need a new file, you have
to look
> for a name which is not used and create the file atomically. Also, the
naming
> convention should be customizable.
>
> > If Jonathan heeds our call, he could probably finish what I started
> > in less than an hour. ;-)

I have been working on something similar for a while. Maybe some
experiences along the way are relevant (helpful?).

The functional requirements were in the area of network logging. The ability
to speedily collect and randomly access huge amounts of data were
fundamental
goals.

Huge files were a detail issue, i..e. how do you store and access over 2Gb
in
a normal OS file? Over 4Gb? More?

Huge solitary files have a reputation for unexpectedly bad performance. In
testing I have found that huge files are bad. But understanding the true
significance
of that trivial sample is time-consuming and thats before you consider all
platforms.
Pretty early on I moved to a striping strategy, i.e. a single virtual
storage file comprising
of a sequence of OS files. I also went as far as file hierarchies (i.e.
folders of
folders of files) as eventually folders have a reputation for performance
problems
beyond certain numbers of entries.

(NB: striping has turned out to be rather convenient. It was quite simple to
go on to develop a "sliding" version - FIFO)

Having sorted out the mass storage issue I still had to deal with the "huge
integers" thing. I suspect I have done a sidestep that may eventually turn
around and
byte me <ahem> but so far local requirements have been fulfilled even with
the following limitations;

* there is no knowledge of bytes consumed, instead I only remember Mb
* the only "addressing" is by ordinal, e.g. log[ 68755 ] so my maximum
addressable
space is a function of 32-bit integers (the ordinal) and the bytes consumed
by
each log entry.

I have a GUI accessing millions of logging entries over Gbs of data
and getting constant performance.

Cheers.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk