Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-04-21 02:08:42


Robert Ramey wrote:

>> -- (critical) The primary issue with the library is lack of reference
>> documentation
...
>> As an illustraction, I've already had a couple of bugs each costing
>> several hours to debug.
>>
>
> I think this is a little harsh. I've had lots of things I had spend
> several
> hours on - including iterator_adaptors, spirit, and mpl. I'm not against
> adding things to the documentation but in general I think it as good as or
> better than any other boost documentation.

I did not intend to be harsh, but I still think the reference documentation
must be present. Even if some boost library does not have it, I think
serialization is a very complex and important library, so it definitely
should have such documentation.
 
> Having said that, I do incrementally clarify bits of documentation as
> people ask me questions.
>
> There's another point that has me a little bit stumped. Most of the boost
> libraries are code that one invokes with a documented signature. The list
> of invocations, syntax and semantics is what I think Vladimir is referring
> to here with "reference documentation". However, the serialization
> library is more "upside down" in that using it consists almost entirely of
> filling
> in stubs called by the serialization algorithm. So its not clear to me
> how the "reference documentation" would look compared to the current
> documents.

For those stubs, requirements docs that Dave mentioned would be nice.
However, there's code that user calls, and I'd like to see docs for that,
too.

For example:

   binary_object make_binary_object(void* ptr, unsigned size);
   Returns: binary_object(ptr, size);

   binary_object::save(Archive& ar, unsigned version)
   Effects: Calls ar.save_binary(m_ptr, m_size);

   binary_object::load(Archive& ar, unsigned version)
   Effects: Calls ar.load_binary(m_ptr, m_size);

That's boring, but makes the semantic 100% clear. Another possible example:

   BOOST_CLASS_EXPORT(CLASS);
   Preconditions: The implementation_level of 'CLASS' is at least
       "object_serializable"
   Effects: cause all saves of pointers to "CLASS" via a pointer to base
   class to include identification string. Enables all loading of classes
   with that identification strings to work.
   Notes: This macro can be invoked only in global scope.

> For those cases not addressed by the above solution - i.e. those a virtual
> interface will be required. This can be addressed by creating of a
> "polymorphic_archive" without requiring changes to the internals to the
> library. This is not difficult if done carefully, but it hasn't been done
> yet. It entails definition of an interface class. The concrete archive
> class would inherit and from both this interface as well as one of the
> current implementations. Its exactly the way the Microsoft COM is
> implemented. I have yet to do this.

I think we agree on this approach. Again, I think it's rather important and
is willing to help, if needed.

>> Say I'm implementing RPC system:
>>
>> class function_call_base { virtual ~function_call_base() {} } ;
>> template<class Arg>
>> class function_call1 : public function_call_base {};
>>
>> // in some code
>> rpc_server s;
>> s.register("foo", &foo);
...
>> Now the problem: BOOST_CLASS_EXPORT can be used only at top-level of a
>> translation unit. There's not way 'register' and automatically export a
>> class.
>>
>> The solution I'd like is:
>>
>> template<class T>
>> class export_class {
>> public:
>> static void instantiate();
>> };
>>
>> So that the object of this type can be created anywhere.
>>
>
> Doesn't including BOOST_CLASS_EXPORT in the header fill corresponding to
> the exported data type perform the equivalent function? It was certainly
> my
> intention that that was how it would be used. In this way the "export"
> would be addressed once and for all. That way all types used - and only
> those types would be instantiated.

It would be nice to avoid asking the user to do BOOST_CLASS_EXPORT for all
possible argument types. What's desirable is:

  template<class T>
  void register_rpc_function(const char* name, void (*f)(const T&))
  {
        functions[name] = ... ;
        boost::serialization::export_class< function_call_1<T>
>::instantiate();
  }

I can't put BOOST_CLASS_EXPORT in register_rpc function now.

>> -- One usage of the serialization library I have in mind is taking "debug
>> dumps" for an application. After the run, a separate tool can be used to
>> look
>> at those dumps. The size of data can potentially be rather large (e.g.
>> 100*640*480 or maybe more), so I'd rather not to load all data, but build
>> some index. So the question is: how is it possible to access specific
>> part of
>> archive without reading it all. I understand it's not possible in
>> general, but in some cases (e.g. BOOST_CLASS_EXPORT for everything) it
>> should work.
>>
>
> I think the could fairly easily be addressed by making a derivation from
> the desired archive type and making an index on class_id and object_id or
> something like that.

But if I save data via pointers, would not random access break the
assumption that saving and loading is always in the same order?

> I wonder if there isn't an XML reader that opens
> just
> the toplevel and only opens the file as one "drills down". Then just
> serialize to XML and use that reader.

I'm afraid the no matter how smart XML reader is, it still would have to
scan the entire file.

>> -- The documentation of 'make_binary_object' is very incomplete and does
>> not
>> tell the exact semantic. As far as I can tell, the function can be only
>> used
>> to load/store objects which have fixed size and will crash when
>> saving/loading new[]-d array -- which seemed an obvious use-case for me.
>>
>> I would suggest that we have two functions:
>>
>> make_pod_wrapper -- which is templated, has one parameter -- object,
>> and
>> stores sizeof(T) bytes. Another name -- make_static_array_wrapper.
>>
>> make_dynarray_wrapper -- which can be used to store/load arrays
>> allocated
>> by new[]
>>
>
> I can expand the doc a bit. binary_object is just a wrapper around a size
> and pointer to permit them to be handled as pair. It presumes the pointer
> already points to allocated storage.

This last sentence is exactly what I'd like added to docs.

> It also presumes that the the size
> of thing its pointing to is the same on save and load. Its very
> lightweight.
>
> I don't see a need for the others, but of course any user can make the
> wrappers he needs.

The reason why I think the other is important, it that's it's actually
saving/loading support for plain C++ array -- which is rather basic thing.

>> -- The library requires exact match between static types for saving and
>> loading. For example:
>>
>> D* d = new D;
>> ia << d;
>> B* b = new B;
>> ia >> b;
>>
>
> ( presume you mean don't mean "new B" as the library creates a new B)
>> does not work. Moreover, it does not work even if 'D' is
>> BOOST_CLASS_EXPORT-ed. I'd suggest that this case should work, since it's
>> very easy to save derived classes using pointer to derived class, without
>> casting to base class.
>
> It doesn't work. Generally saving to one type and loading to another is
> not supported. In this particular case it could be made to work but it
> would
> significantly complicate the library. Also, supporting a practice such as
> the above, would lead to serialization which is much harder to debug. I
> don't believe that there is a situation in which doing the above is
> necessary. The appropriate coding of the above would be:
>
> B* b = new D;
> ia << d;
> B* b2;
> ia >> b2;
>
> I don't think that's an unreasonable burden.

As I've said previously, this is a bug which is easy to make and very hard
to debug. I think we can only wait for others to express an opinion, as we
fail to convince each other.

>> -- It appears that XML archive require member/variable names in all
>> cases. ... It
>> would load archive back even if I change all the names manually in XML
>> file right? Then why requiring names?
>
>>
>> -- I have some concerns about XML archives. ....
>> ... As it stands now, even finding all instances of "C" is hard.
>> There
>> may be a number of similiar issues. If the goal is to produce XML which
>> can
>> be processed by other programs, then at least the class name should
>> become element name.

> As its stands, I looked at, made my choices and hardly anyone has
> complained. I guess that its probably has limited utility. I think that
> everyone has assumed that someone else will find it indispensable.

Maybe, nobody has tried it *yet* and it's just a matter of time.

> BTW, one usage of XML archives did occur to me. By checking the name tag
> on input, we can implement a crude check that save and load operations are
> synchronized. This would effectively a debug mode for archives and might
> be useful.

But you won't catch a case where you save one type and load another.

>> -- I believe that library should not change codecvt facet of the stream.

> I've wrestled with that. My current thinking is that the system should
> not
> be changed. The reason I did this was that I found that the default
> codecvt on wchar_t files was to narrow the character and throw away the
> high order
> bits. Also UTF-8 is need for wchar_t XML. I included the ability to
> override this by specifying no_codecvt flag upon opening the archive which
> leaves the codecvt unchanged. So I'm happy with the current method which
> is to reflects the most common user intention and permits one to suppress
> this default behavior. I've come to conclude that that the only situation
> where the codecvt can't be changed on the fly is with the old dinkumware
> library. I'm still checking for this though.

Ok, I'm still of the opinion that anyone who uses wide string should know
about codecvt. But this is not very important.

>> -- I've tried to play with "implementation level" and created the example

> The easiest way to verify this is to use an XML archive which labels all
> the
> items. Current archives contain no extraneous data (except the name in
> the XML archive).
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="3">
> <c class_id="1" class_name="C" object_id="_0">
> <i>10</i>
> <j>4</j>
> <base class_id="0" tracking_level="1" object_id="_1"></base>
> </c>
> </boost_serialization>

So, the get zero overhead I need to tweak base class an disable tracking of
pointers. Let me try that... yea, the results are nice. One one extra
element (class id) per saved item.

BTW, how to I set tracking level and implementation level for a template
class. I think I can partially specialize 'tracking_level', but it should
be mentioned in the docs.

>> ** Implementation issues **
>>
>> -- I suggest that all MPL expressions in the library are refactored for
>> better
>> readability. For example, the expression in save_non_pointer_type::invoke
>> is
>> 28 lines long. I think it should be possible to extract some of the
>> subexressions and create a typedef with good name -- looks like
>
> I think the indentation is better than creating new names.

Opinions differ ;-)

>> instantination of those expression in all cases should not cause compile
>> errors.
>
> Hmm - please expand on this.

Ignore that, that was thinko.

>> ** Custom archive creation **
>>
>> I've tried to create a custom archive class (in fact,
>> polymorphic_oarchive),
>> and got some comments.
>>
>
>> - Documentation says the derived class must provide operator<< and method
>> 'load'. It also says there common_iarchive::operator<<. I wonder why the
>> derived class should provide operator<<. Can't base operator call 'load'
>> directly?
>>
>> - Even if not, why derived operator<< should have
>>
>> return * This();
>>
>> not plain "return *this"? After all, we're in most derived object
>>
>
>
>> - It's needed to add
>>
>> friend class boost::serialization::load_access;
>>
>> to class declaration. However, it's not stated where this type is
>> declared,
>> and in fact in declared in *internal* header
>
> I wasn't aware that such a friend class had to be declared. Recently one
> compiler has flagged with a warning which is annoying. So, we'll look at
> this if only to suppress a benign warning.

According to std::3.4.4/3, "if the name is qualified-id ... if this name
looking does not find a previously declared class-name, ... the
elaborate-type-specifier is ill-formed".

In this case, though, it appears that common_oarchive.hpp already brings the
necessary declaration, so there's just a typo in the docs.

>> boost/archive/detail/iserializer.hpp
>>
>> And, finally, the right syntax seems to be:
>>
>> friend class boost::archive::load_access;
>>
>> (Note the namespace!)
>>
>
> Hmm - I don't see that this is required given that its be called from
> within boost::archive::detail. Does any compiler complain?

Well, if you write

    friend class boost::serialization::load_access

gcc reports that there's no 'load_access' in 'serialization' namespace --
quite reasonable.

>> - The documentation should really state the minimal set of 'load'/'save'
>> overloads which will make archive usable. For example, it's probably not
>> necessary to provide separate overload for 'bool', right?
>
> The document states:
>
> " However, we're not quite done. The above code addresses serialization of
> all non-primitive types. To be complete, each primitive type must either
> be covered by a definition of template<typename T> void load(T & t); or an
> overload of the >> operator"

It also necessary to provide overload for char*. Does not count as primitive
type?

> " If all primitive types have been accounted for, any program with
> serialization defined should work with the new archive."
>
> Maybe I can expand upon that a little to something like:
>
> "Any program with serialization defined should work with the new archive
> as long as every primitive type has a matching save/load function
> prototype or template."

Can I define only one 'load' for unsigned int?

>> - (critical) It should be explained how the newly-created class can be
>> tested.
>> As it stands, one should either do superficial tests or try to integrate
>> the
>> new archive into existing tests. As for latter -- I don't know how to do
>> that, and also, I think it's not necessary to run all the tests with the
>> new
>> archive. For example, serialization of vector and map is not likely to be
>> broken with the new type. What I'm looking for is a test which
>> saves/loads all possible types and verifies results.
>>
>
> I've concluded this myself. Its pretty easy given the bjam setup. Maybe
> it can be made even easier with a bjam argument or environmental variable.
> It
> just needs to be explained.

Right.

>> -- Docs and code use "const unsigned long int version" everywhere. Is
>> that really necessary, as opposed to "unsigned version"?
>
> That's been fixed. I'm not sure you have version # 18.

I use the versions of docs from rrsd.com. The docs in #18 also use const
unsigned long int version, for example in "A Very Simple Case" section.

> The manual is wrong. I'll fix it. I commented out the code that
> serializaed (char *) and (wchar_t *) as strings. I was concerned about
> buffer overrun security problems. I don't remember my reasoning for this
> right now but I don't think implemention char * serialization as a
> primitive type is a good idea.

Do you mean you plan to always serialize char* as arrays and remove
'load'/'save' overloads for char*? Again, I'm not sure I understand the
motivation.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk