Boost logo

Boost :

From: Dave Abrahams (abrahams_at_[hidden])
Date: 1999-12-19 22:21:54


Beman wrote:
>Dave wrote:
>>My view has always been that it shuold be treated as a streaming problem, by
>>using a special kind of ostream that writes in a portable binary format
>>instead of as text. A highly efficient variant would use a simple form of
>>compression.
>
> Yes, that would be very space efficient. But not very fast if you have
> lots of data such as in a database, because there has to be a call to a
> (probably complex, probably non-inline) stream function for each integer
> read or written.

That's not what I had in mind. If you have lots of zero bytes and small
numbers as is likely in a database a simple protocol which writes 7 bits at
a time and uses the high bit to say "end of number" would be excellent, and
could be inlined easily.

> It is actually worse that that, because there would often
> first have to be a call to a manipulator to specify the length.

That's _really_ not what I had in mind. Why would one bother with that?

> The beauty of the bin_ubin.hpp classes is that underneath they are just
> chars, so can be read or written in bulk. Thus something like a B-tree
> page with a thousand of these objects is read or written with a single
> binary read or write.

Well, sure, but first you have to get them all lined up in memory next to
each other. That's probably not the format of your database, is it (long
linear arrangements of these bin/ubin types)? The process of getting them
lined up that way is equivalent to the streaming process I was referring to.

> On the other hand, these are low level classes. Applications where maximum
> binary speed and space efficiency isn't required would be better off to
> just use ASCII text streams for data interchange.
>
>>The classes as implemented look like you have to make some assumptions about
>>class layout if you want to use them for the purpose as I understand it.
>>This may be OK because they're borderline PODs, but I'm not checking to see
>>whether that's in fact true.
>
> My understanding is that a compiler would actually be permitted to insert
> padding bytes between bin/ubin objects in a structure, thus destroying
> portability. Use of classes like these (and earlier C language
> equivalents) stretches back over 17 years, however, without problems. Use
> has included
> everything from giant mainframes to small embedded systems, dozens of
> operating systems, and applications both large and small.

I think we ought to be really, really cautious about adding code to boost
that is guaranteed to work in portable C++ unless there's a very clear way
to parameterize for a particular platform.

>> Still, I don't think they are likely to be
>>placed as desired in an array. Just how _are_ you supposed to use these?
>
> The usual is to put them in a structure used for I/O:
>
> struct BalanceRecord {
> boost::ubun24 account_number;
> boost::bin32 balance;
> ...
> } rec;
>
> some_binary_ifstream.read( static_cast<char*>(&rec), sizeof(rec) );
> ...

Are you proposing to use that structure as the internal format of your
database, or is it just an intermediate step on the way to/from the I/O
system? In any case, is it really likely that the struct is big enough to
make a real speed difference?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk