Boost logo

Boost :

From: Beman Dawes (beman_at_[hidden])
Date: 1999-12-20 13:57:10


At 10:21 PM 12/19/99 -0500, Dave Abrahams wrote:

>Beman wrote:
>>Dave wrote:
>>>My view has always been that it shuold be treated as a streaming
problem, by
>>>using a special kind of ostream that writes in a portable binary
format
>>>instead of as text. A highly efficient variant would use a simple
form of
>>>compression.
>>
>> Yes, that would be very space efficient. But not very fast if you
have
>> lots of data such as in a database, because there has to be a call
to a
>> (probably complex, probably non-inline) stream function for each
integer
>> read or written.
>
>That's not what I had in mind. If you have lots of zero bytes and
small
>numbers as is likely in a database a simple protocol which writes 7
bits at
>a time and uses the high bit to say "end of number" would be
excellent, and
>could be inlined easily.

Ah, yes. I use that technique very effectively in a compact database
format where space is at a premium, and variable length data doesn't
cause direct access seeks to be a problems.

>> It is actually worse that that, because there would often
>> first have to be a call to a manipulator to specify the length.
>
>That's _really_ not what I had in mind. Why would one bother with
that?

To identify the data lengths. bin/ubin's are of fixed lengths, and
that is important for fixed length data records (which allow access
without any subsidiary index since records can be located by address
calculation.)

>> The beauty of the bin_ubin.hpp classes is that underneath they are
just
>> chars, so can be read or written in bulk. Thus something like a
B-tree
>> page with a thousand of these objects is read or written with a
single
>> binary read or write.
>
>Well, sure, but first you have to get them all lined up in memory
next to
>each other. That's probably not the format of your database, is it
(long
>linear arrangements of these bin/ubin types)?

Yes, that is the usual arrangement. Internal and external formats
are exactly the same - "long linear arrangements of these bin/ubin
types."

> The process of getting them
>lined up that way is equivalent to the streaming process I was
referring to.

You "line them up" at design time, not at run time.

>I think we ought to be really, really cautious about adding code to
boost
>that is guaranteed to work in portable C++ unless there's a very
clear way
>to parameterize for a particular platform.

I think you meant "be really, really cautious about adding code to
boost that is *not* guaranteed to work in portable C++." Yes, I
agree with you that this is an area of concern. I will think some
more about ways to ensure portability, although past attempts were
both uglier and slower.

>>> Still, I don't think they are likely to be
>>>placed as desired in an array. Just how _are_ you supposed to use
these?
>>
>> The usual is to put them in a structure used for I/O:
>>
>> struct BalanceRecord {
>> boost::ubun24 account_number;
>> boost::bin32 balance;
>> ...
>> } rec;
>>
>> some_binary_ifstream.read( static_cast<char*>(&rec), sizeof(rec)
);
>> ...
>
>Are you proposing to use that structure as the internal format of
your
>database, or is it just an intermediate step on the way to/from the
I/O
>system?

The internal format and external format are identical. That is why
you can read/write/seek the struct as a bucket of chars.

> In any case, is it really likely that the struct is big enough to
>make a real speed difference?

Certainly on slow processors like embedded systems. The test case I
am most concerned with is a classic B+tree. Maybe I can put together
a test program.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk