Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-04-20 08:26:32


Here's my review of the library. It's based on reading docs and source for
several hours, as well as using almost every previous version of the library
on my project.

I think the library should be accepted. I voted "no" the previous time, but I
think library was considerably improved. I'd also like to thank Robert for
pushing this effort forward!

Detailed comments are below. Some of them are marked as "critical" -- it means
I believe they should be addressed before the library is added to official
distribution.

** General issues **

-- (critical) The primary issue with the library is lack of reference
documentation -- exact description of semantics for every method (preferrable
in C++ std style) and overview for all important classes. The library is
non-trivial, and without such documentation user will have to experiment to
find out that exact semantic.

Better internal docs (comments) are also needed. For example, I'm looking at
save_access::end_preamble and don't know what it does. Comment for
basic_oarchive::end_preamble says what this method is for, but does not say
when this method is called, which makes it hard to do anything. Again,
implementation is non-trivial. Lack of comments make it somewhat dangerous to
use the library -- if I hit a bug while Robert is on vacation, I'm not sure
I'll be able to quickly fix it locally.

As an illustraction, I've already had a couple of bugs each costing several
hours to debug.

-- (critical) While BOOST_CLASS_EXPORT is a big step forward, I think it can
be further improved. Now, if I export a class I need to include some archive
headers before invoking BOOST_CLASS_EXPORT. The class can be only serialized
to archives which were included. This means that to really export a class, I
need to include *all* archive headers. That's not good. What I suggest is

     - create new class 'polymorphic_archive', which will have virtual
       functions for saving/loading and create templated
      'polymorphic_wrapper', which can be used to add this polymorphic
      interface to existing archive.
     - always register all exported classes with polymorphic archive. When
       serializing in a specific archive, check if class is registered with
       that archive. If not, serialize it via polymprphic wrapper.

In fact, I'd be willing to help with polymorphic archive implementation, but
some changes inside serialization library will be required.

-- (critical) I suspect BOOST_CLASS_EXPORT does not play nice with templates.
Say I'm implementing RPC system:

      class function_call_base { virtual ~function_call_base() {} } ;
      template<class Arg>
      class function_call1 : public function_call_base {};

      // in some code
      rpc_server s;
      s.register("foo", &foo);

      function_call_base* c;
      while(archive >> c)
                (*c)();

The idea is the caller will serialize instances of function_call1 and the
callee will deserialize it and call the proper function. To make this work,
we have to BOOST_CLASS_EXPORT all used instantinations of function_call1. We
can assume that all functions (and therefore all argument types) are passed
to the 'register' function and can function can invoke BOOST_CLASS_EXPORT.

Now the problem: BOOST_CLASS_EXPORT can be used only at top-level of a
translation unit. There's not way 'register' and automatically export a
class.

The solution I'd like is:

   template<class T>
   class export_class {
   public:
                static void instantiate();
   };

So that the object of this type can be created anywhere.

-- One usage of the serialization library I have in mind is taking "debug
dumps" for an application. After the run, a separate tool can be used to look
at those dumps. The size of data can potentially be rather large (e.g.
100*640*480 or maybe more), so I'd rather not to load all data, but build
some index. So the question is: how is it possible to access specific part of
archive without reading it all. I understand it's not possible in general,
but in some cases (e.g. BOOST_CLASS_EXPORT for everything) it should work.

-- The documentation of 'make_binary_object' is very incomplete and does not
tell the exact semantic. As far as I can tell, the function can be only used
to load/store objects which have fixed size and will crash when
saving/loading new[]-d array -- which seemed an obvious use-case for me.

I would suggest that we have two functions:

   make_pod_wrapper -- which is templated, has one parameter -- object, and
   stores sizeof(T) bytes. Another name -- make_static_array_wrapper.

   make_dynarray_wrapper -- which can be used to store/load arrays allocated
   by new[]

-- The library requires exact match between static types for saving and
loading. For example:

     D* d = new D;
     ia << d;
     B* b = new B;
     ia >> b;

does not work. Moreover, it does not work even if 'D' is
BOOST_CLASS_EXPORT-ed. I'd suggest that this case should work, since it's
very easy to save derived classes using pointer to derived class, without
casting to base class.

-- It appears that XML archive require member/variable names in all cases. I'm
not sure that's reasonable -- given that the library requires exact
correspondence between loading and saving order even for XML archives. It
would load archive back even if I change all the names manually in XML file
right? Then why requiring names?

-- I have some concerns about XML archives. The point of XML, in general is
that it's somewhat readable by humans, and readable by other applications
written using different libraries or in different languages. However, when I
look at:

    <c2 class_id_reference="1" object_id="_2">
           <i>10</i>
           <j>4</j>
           <base object_id="_3"></base>
    </c2>

it's not clear if any of those benefits are there. This actually corresponds
to a class "C". To find out this, one has to find "class_id" attribute with
value of "1" somewhere above. If the above was:

    <C name="c" ...>

finding all "C" elements would be trivial with XPath, or even with more simple
methods. As it stands now, even finding all instances of "C" is hard. There
may be a number of similiar issues. If the goal is to produce XML which can
be processed by other programs, then at least the class name should become
element name.

-- I believe that library should not change codecvt facet of the stream. There
are two reasons. First, if user has already imbued some facet (say utf-8), he
might have good reasons to do so (say, he believes utf-8 is the most
space-efficient for his data). So the serializaton library should not
silently change codecvt to ucs-2. Second, the whole locale stuff is complex,
and it's not even clear if it's possible to change codecvt facet on the fly,
and when the change will take effect, and so on. I believe requiring user to
'imbue' proper facet himself is better than some unexpected internal effects.

-- I've tried to play with "implementation level" and created the example at

    http://zigzag.cs.msu.su:7813/tracking.cpp

It stores a pointer to a single class with two data members and a single base
class. I get the following archive content.

   1
   1 C
   0 10 4 1 0
   1

Is seems too much for a class with implementation level of "object
serializable". 10 and 4 is the data itself. "1 C" is probably class name, but
what other data is?

** Implementation issues **

-- I suggest that all MPL expressions in the library are refactored for better
readability. For example, the expression in save_non_pointer_type::invoke is
28 lines long. I think it should be possible to extract some of the
subexressions and create a typedef with good name -- looks like
instantination of those expression in all cases should not cause compile
errors.

** Custom archive creation **

I've tried to create a custom archive class (in fact, polymorphic_oarchive),
and got some comments.

- Documentation says the derived class must provide operator<< and method
'load'. It also says there common_iarchive::operator<<. I wonder why the
derived class should provide operator<<. Can't base operator call 'load'
directly?

- Even if not, why derived operator<< should have

     return * This();

not plain "return *this"? After all, we're in most derived object

- It's needed to add

  friend class boost::serialization::load_access;

to class declaration. However, it's not stated where this type is declared,
and in fact in declared in *internal* header

    boost/archive/detail/iserializer.hpp

And, finally, the right syntax seems to be:

  friend class boost::archive::load_access;

(Note the namespace!)

- The example uses:

        boost::serialization::load(* This(), t);

Again, what header is this function in? The answer is that it's in the same
header as above, and the line should be:

        boost::archive::load(* This(), t);

- The documentation should really state the minimal set of 'load'/'save'
overloads which will make archive usable. For example, it's probably not
necessary to provide separate overload for 'bool', right?

- (critical) It should be explained how the newly-created class can be tested.
As it stands, one should either do superficial tests or try to integrate the
new archive into existing tests. As for latter -- I don't know how to do
that, and also, I think it's not necessary to run all the tests with the new
archive. For example, serialization of vector and map is not likely to be
broken with the new type. What I'm looking for is a test which saves/loads
all possible types and verifies results.

** Various comments **

-- The BOOST_CLASS_EXPORT macros defines an explicit instantiation of a
function. gcc rejests this when BOOST_CLASS_EXPORT is used inside some
namespace. I've asked on comp.std.c++:

http://groups.google.com/groups?q=Explicit+instantiation&hl=en&lr=&ie=UTF-8&safe=off&selm=u5boi1-7te.ln1%
40zigzag.lvk.cs.msu.su&rnum=3

and got a reply directing me to:

    http://std.dkuug.dk/jtc1/sc22/wg21/docs/cwg_defects.html#275

Which means gcc is mostly right. So, the macro should either be changed
somehow, or at least the restriction should bes documented -- using
BOOST_CLASS_EXPORT outside of any namespaces is OK.

-- Docs and code use "const unsigned long int version" everywhere. Is that
really necessary, as opposed to "unsigned version"?

-- The library saves char* as arrays of integers. The rationale section says:

    input of char * and wchar_t * could be altered to create a buffer overrun
    and potential security problem.

I don't understand this point at all. How storing the elements as integer
values helps with the buffer overrun? Moreover, library stores char*
internally (e.g. for archive signature). What if that string is modified?
Will that cause buffer overrun?

-- The following program (yes, of two lines), causes a compile error:

    #include <boost/archive/text_oarchive.hpp>
    #include <boost/serialization/export.hpp>

It's because export.hpp uses 'instantiate_pointer_iserializer' but does not
include the header where that class is defined. 'text_oarchive.hpp' does not
include it either. The error goes away if I include

    #include <boost/archive/text_iarchive.hpp>

but a better diagnostic is desirable.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk