Boost :

Date view	Thread view	Subject view	Author view

From: Gennadiy Rozental (gennadiy.rozental_at_[hidden])
Date: 2002-11-20 07:42:15

Next message: Gennadiy Rozental: "[boost] Re: Serialization Library: review"
Previous message: Roland Richter: "[boost] lexical_cast and bool"
Next in thread: Gennadiy Rozental: "[boost] Re: Serialization Library: review"
Reply: Gennadiy Rozental: "[boost] Re: Serialization Library: review"
Reply: Matthias Troyer: "Re: [boost] Serialization Library: review"

Hi,

After spending some time with library, looking through docs and code,
compiling it and comparing with my expectations I see following 5 major
issues with submitted library:

[Issue 1] Registration/reflection facility should be completely separated
from serialization implementation and became registration policy template
parameter

[Issue 2] Library should be subdivided into 2 *independent* library serving
purpose of serialization and deserialization.

[Issue 3] Library seems to hardcode important part of functionality that
users may want to overwrite. Here I refer in most part to archive/object
preamble.

[Issue 4] Presented solution for serialization of STL collections is
seriously flawed.

[Issue 5] Quality of code/implementation. I believe Boost should demonstrate
examples of solidly written code; requirement submitted library not always
comply to.

I was able to compile library and tests with MSVC6.5. test.cpp produce
couple runtime failures. See attached.

Now let take a look in more details:

Documentation
---------------

As it what mentioned by some other reviewers documentation is incomplete and
sometimes incorrect and most importantly missing very important topics.

Tutorial:
page 2: after // close archive you are closing ifs - should be ofs
page 3: in function load incorrect type of the second argument: int
file_version

Reference:
Section that is definitely missing is one describing in details (including
contract) how to write custom archives.

Operators for Pointers:
const * T should be const T * (or T const* but this comment later)
Why are you using "backward" logic while describing issues addressed
to properly save and restore pointers. Reader will read the document from
top to bottom and all these "must" "need" "have to be" are confusing.

Operators for Templates:
Operators you are describing in this section does not exist in reality
and STL containers serialization is implemented using different means (even
for compiler that support [partial specialization it would not look like
what you described). IMO you confusing readers and simple incorrect.

Serialization of Classes:
code snippet at the end and one before using array "a" that never declared.

Serialization of templates:

Passage about "not working as expected" simply incorrect. It does world if
partial specialization is supported. Following example successfully compiled
and with both gcc and bcc32 I have access to:

#include <iostream>

template <typename T>
struct A{};

template<typename T>
struct B { enum { value = 0 }; };

template <typename T>
struct B<A<T> > { enum { value = 1 }; };

int main() {
   typedef A<int> a_int;
   std::cout << B<a_int>::value;
   std::cout << B<int>::value;
}

In reality what should be discussed here is that library supports several
ways to specialize the serialization logic and user may choose any way he
prefer depends on it's preferences and used compiler. And BTW it's not
specific to the templates this 3-layer logic is used for *all" classes. What
I mean here (and it should be clear from docs that currently library is
using following 3 layer logic for overwriting serialization algorithm:

Layer 3: one could use partial ordering for specializing serialization for
type T by overloading function
void boost::serialization::sterilization_save( basic_oachive& ar, T&,
long );
by default it referee to the serialization<T>::save(...)

Note that I moved it out of details into serialization namespace.
(BTW it was one of the reviewer requested - it's already in)

Layer 2: one could provide partial/full specialization of traits class
specialization
by default if refer to the T::save(...)

Layer 1: one could provide the function save(...) as member of type T
accessible for serialization<T>

Moreover I think that it should be clearly specified that Layer 3 is
"temporary" and may be eliminated in a future so it is recommended to use
one of the layers 2 or 1.

Very large numbers of Small Objects:
In function serialization<RBG>::save_ptr extra "(" near BOOST_STATIC
_ASSERT.
vector data? doesn't vector need any template parameters?
When you serializing the struct line it's unclear how you save/load vector
size

Exception safety:
First paragraph: "this problem" What problem? It should be clearly stated or
rephrased.
Each scenario you are describing should be accompanied by the simple but
clear example code.

"Other cases" refer to the altered shared_count.hpp which permits the
serialization of shared_ptr. Does this means that the things will stay as it
is users won't be able to serialize instances of shared_ptr? Whose fault is
it?

Serialization Implementation included in the library:
"This header is automatically included in any program..."
This is incorrect decision and docs will need to be fixed together with
code. See below.

Advice:
Should be called "Advises"

Working around ...
"... compiler will attempt to declare an instance of an abstract base class
as part of type traits package". Why? Where? could not you provide a
workaround yourself.

Rationale:

quiet - should be quite

rationale for not making "serialization" member of boost namespace is weak
in IMO and decision was incorrect.
Design
-------

In general design of the submitted library is solid and clear. But there are
several very important issues.

Major [Issue 1]: I personally in my practice never used type_info and may be
willing to supply my own type registration system and accordingly would not
want any decision to be forced on me. In this sense "all" reference on type
type_info and operator typeid should be eliminated from core library.
RestrationPolicy should be supplied as template parameter and this policy
will define type_id type and function for getting type_id from value of type
T. Interface will need to be discussed in more details in a separate thread.
Library could supply one implementation if registration policy based in
type_info (and it will be the only place where type_info will be
referenced).

Major [Issue 2]: I believe it design error to couple both sides of
serialization together in one library. It should be separated . So that user
should be able to use only required part. It's not only eliminate not
required dependency, but also make code much more clear and easier to
navigate and compiled code smaller.

Major [Issue 3]: Submitted library is somewhat limited in a means to modify
what is written in archive header (I can change it but I will need to supply
full fledged specific archive cause currently this logic is in constructor).
And even more limited in a means to modify object header together with logic
bound to it. Let me give couple examples.

1. Serialization signature is 22 bytes. Let say I am serializing messages
into binary format for sending them through network. Size of my serialized
code is 10 bytes. I do not believe I could accept 22 bytes of "signature
information". The same about the version. I may or may not be interested in
versioning of serialization system.
2. Binary archive constructor save some "guard" information. I may or may
not want to do this. Or I may want to be even more careful.
3. Object save/load function contains pretty expensive logic for making sure
that object is stored/loaded only once. If for any reason (performance) I
could not afford constant searching but have external knowledge that all
objects are different I may want to skip lookup phase.
4. Let say I am sending data by contract. It could be fixed contract in
external file loaded during initialization or dynamic one sent before
sending real data. The purpose id to save the bandwidth and eliminate all
object/class ids from serialized stream. What should I do?

Also as mention in other reviews it's seems from library design that there
is no way to implement proper bracketing of the data for formats like XML.

Other issues mostly minor.

Object_id generation logic is transparent to the data type. You may save
both CPU and space if you choose to use type specific object identifiers.

version should be static constant member of the traits or user class and not
a function at all.

Implementation
---------------
Major [Issue 4]: support of serialization of stl data structures is flawed.
I would never agree to include header that will bring also: vector, list,
deque, map, hash_map, slist.
First I may not be using them at all. Second even if I do use vector I
should include <boost/serialization/vector.hpp>. The same with any other STL
data structure (only some of them are supported currently, while should be
all). IOW serialization library will need to repeat the structure of STL
(header-wise). And users of the library will need to include appropriate
header.

Library should be using following namespaces:

boost:
basic_..achive family only. The only part of public user interface

boost::serialization
struct traits (currently serialization class), used by used supply UDT
sterilization traits

boost::serialization::detail
rest of the code including void cast (which is better be named runtime_cast
cause the name should emphasize not void part but namely dynamic runtime
essence of it)

Accordingly directory structure should reflect this: archive.hpp should be
the only header under boost. Rest under serialization and
serialization/details directories.

Use of template set to keep all static information is questionable. In
several cases usage pattern looks like several inserts and a lot of
querying. sorted vector would work better in this case. See Matt Austern
article on this topic.

basic_[i|o]archive uses virtual functions for serializing intrinsic types.
In some rare cases it may be expensive
Why rare? because in majority of the cases price of double dispatching is
incomparable with i/o operations. But it may be the case when you
serializing directly in memory lots of intrinsics. When user may want to
eliminate double dispatching somehow. And even if it is not first priority
task (IMO) you may think in a future instead if hardcoding virtual functions
into basic_ classes to use mpl base generation facility that will allow
serialization users to use compile time polymorphism.

Code
------
I have serious concerns to the quality of the some part of the code
supplied. It's lacking proper comments, incur unreasonable dependencies
contains errors and not recommended styles. Let see the details:

void_cast.hpp: see attached file for reworked version
1. header guard named with _H at the end. I would prefer _HPP
2. <set> <functional> should go into implementation file together with set
variable
3. <typeinfo> should go
4. <boost/type_traits.hpp> <boost/type_traits/conversion_traits.hpp> should
go. use is_const and is_polymorphic that you are really using.
5. namespace boost::serialization::detail
6. collection is used only in implementation file. should be moved there -
no need to pollute the header with <set> dependency. less functor,
operator<, collection_imp go together. And become free functor.
7. direction_enum. I would prefer name cast_direction. It's obvious that
it's enum from the code.
8. inverse function never used
9. Why data members, constructors, virtual functions all messed up in seems
like an arbitrary order. And all public.
10. Why cast is the pure virtual function? It's the only reason why
void_caster_argument exist. Make return NULL; default implementation and
eliminate second class.
11. Why insert is static function with argument? Couldn't we just
self_register (see the code)?
12. void_cast_primitive, void_cast_register template parameters all
uppercase. It's recommended to use mixed case in this case.
13. static_caster, dynamic_caster template parameters names better be
decrypted from F, T to From, To
14. __declspec(noinline) doesn't work with MSVC6.5
15. void_cast_register need argument Base* dummy = NULL for it to work with
MSVC 6.5
16. names of the format function arguments are using leading _. It's
reserved for compilers. Better be trailing one.
17. void_cast register is extremely weird:
{
return expr1;
expr2;
}
It's unclear for me why it should work at all.
18. Why void_cast_primitive and void_cast_derived are using different logic
for instantiation. Why do you try to instantiate second on stack? It does
not buy you anything - the price is complicated logic.
19. For the symmetry I would name void_cast void_upcast
20. You do not need const_cast to convert from void* to void const*
21. Boost recommends to keep * and & close to type not to variable.
22. why direction_enum is the member of void_caster?
23. at the end of the file in a comment wrong header guard.

void_cast.cpp
1. After change in item 6 above you don't really need operator< any more.
It's logic will move into one free "compare" functor. You don't need else
clauses in this implementation.
2. void_cast implementation IMHO need some empty line separators.
3. at the end of void_cast implementation wrong namespace name in a comment

collection_imp.hpp is not really used it's copy/pasted into stl.hpp and
should not be part of the submission in it's current state. In fact in my
proposition it should exist and all vector.hpp map.hpp reuse it.

serialization_imp.hpp
1. eliminate type_traits header. use specific
2. set included but never used!!
3. #define typename is not only bad style it simply glitch cause it make
MSVC to fail badly, as it was reported already in some post. The intend was
solve MSVC dependent typename issue. Look through boost code for several
fixes for this one (I really think it should be part of MSVC configuration.
4 .class serialization comment I would start with "for each type..."
5. In the same comment: passage " Note that I chose to avoid..." IMO is
incorrect and alternative "failed" variant should be used instead.
6. type_info_extended_save/load hide constructors. which it should not be
doing at all cause it inherited from boost::noncopyable and have non-default
constructor already.
7. type_info_extended_save/load have repeated logic that could be shared
though the base class
8. From the code unclear why above classes exist at all. IIUC in attempt to
place an implementation into source file. Value is questionable considering
size of the functions.
9. save_pointer_type template parameter should be TPtr.
10. It you take a look onto instantiate_nothing implementation you will be
amused to see following:
template<class T>
struct instantiate_nothing
{
   static type_info_extended_save & instantiate(){
        BOOST_STATIC_ASSERT(false);
        type_info_extended_save *tix = NULL;
        return static_cast<type_info_extended_save &>(*tix);
   }
};

a. Why this code dereference NULL pointer?
b. Given BOOST_STATIC_ASSERT(false); why anything else exist at all?
c. If this is not supposed to be instantiated why not to check directly in
instantiate(...) that T is not fundamental and not enum?
The same with both save and load sides

11 assert_polymorphic: why exist? Why not use static assert? do we need this
check at all?

serialization.cpp
1. comment on top state that it implements Persistence class
2. I would prefer STL header to *follow* local ones. Not vice-versa.
3. cobject_arg - the same story as with void_cast_argument. It should be
eliminated by defining default behavior in base class.
4. near to the statement where find returns NULL there is worrying comment.
Should not it be reported to the user?

archive.hpp
register_type function need T* dummy = NULL argument
count == is.gcount needs static_cast

archive.cpp
why ARCHIVE_SIGNATURE is not std::string?
hack in line 96 should be ifdefed

Testing
-------

void_test_cast.cpp
Includes fstream and never use it. main had wrong specification. Did it work
for you?
I change it to use test/minimal.hpp and BOOST_CHECK See attached file

test.cpp
Should not it be names serialization_test.cpp
I would consider separating it into several simple tests. MSVC6.5 was
complaining about compiler limit reached.

Conclusion
-----------

In conclusion of this review I would like to thank Robert for his work that
has definite potential. But in a light of above issues I vote NO to include
the library now in boost. Once major issues will be resolved we will need to
return to the library to check for some more subtle issues. Especially with
registration. Once code is separated and cleaned a bit it will be easier to
analyze it also. Looking forward to see those fixes.

Regards,

Gennadiy.

Next message: Gennadiy Rozental: "[boost] Re: Serialization Library: review"
Previous message: Roland Richter: "[boost] lexical_cast and bool"
Next in thread: Gennadiy Rozental: "[boost] Re: Serialization Library: review"
Reply: Gennadiy Rozental: "[boost] Re: Serialization Library: review"
Reply: Matthias Troyer: "Re: [boost] Serialization Library: review"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk