Boost logo

Boost Users :

Subject: Re: [Boost-users] [boost-users][serialization]Linking time problemswith a class hierarchy
From: Robert Ramey (ramey_at_[hidden])
Date: 2010-09-10 12:55:51


Laszlo Orban wrote:
> Currently we are having a class hierarchy consisting of 50+ class
> definitions (all descendants of one class) along with some assistant
> structures; the typical you would expect from a 'real' application,
> and this base is growing each day. We are serializing and
> deserializing objects from these classes in a kind of client-server
> application. Everything works fine (testing on both linux and windows
> platform using gcc 4.4.1), however we are experiencing a rather weird
> issue. It started quite some time ago, but only starting to get
> really annoying nowadays: the compile and linking time of the
> projects that uses these classes are starting to get ridiculous and
> we could trace this back to serialization. As usual for polymorphic
> archives, we are using the BOOST_CLASS_EXPORT_GUID macro for creating
> the necessary singletons for the class definitions. To narrow down
> the problem we have tried the following checks:
>
> 1.) If we comment out all the BOOST_CLASS_EXPORT_GUID macros, the
> problem is gone, linking and compiling time is reduced to some
> seconds, the length of the executable and the object files are
> 'normal', but of course the serialization doesn't work. If I put the
> macros back, the compiling of the source file that contains them
> skyrockets to 2-3 minutes, also the linking time to around the same
> time on a Core2Duo 2 GHz machine, and the length of the object file
> (and thus the executable) goes from several hundred Kbs to 15+ Mb-s.
> (A bit varied values on the same code depending if it is linux or
> windows, but the same behaviour nonetheless.)
>
> 2.) We have tried to mess around with the number of macros running,
> and it seems the increase of the compile time is linear, linking time
> is more like exponential.
>
> 3.) We have tried to 'clean' the class structure and reduced the
> members in the classes to only some basic booleans and integers (even
> purged the std::string-s), in most classes completely eliminating all
> members just let the 'skeleton' there, and still: no significant
> change in compile and linking time.
>
> 4.) We have tried to do all this is in a 'clean' project with only a
> "void main() {}" method and the class hierarchy, to ensure that not
> somehow the usage of it causes the problem: negative, the problem
> still exists.
>
> Now it is not a great problem that the compile is so slow, as that
> causes a problem only when the class hierarchy is changed. Also the
> executable size is only a bit irritating, but not a serious problem.
> However the linking time is something we need to solve asap, as it is
> hindering our development a lot, especially as this class hierarchy
> is used in several applications (naturally, we wouldn't need
> serialization otherwise), thus the linking problem affects all of
> them. (And this will be worse later, as the class hierarchy is
> growing each day.)
>
> Any ideas about how to go on from here? Is there any way I can change
> the BOOST_CLASS_EXPORT_GUID macro to a more simple? effective? one?
> Or maybe all this is normal and we have to live with it?
>
> (This is a strongly reduced snippet on how we are using the
> polymorphic serialization with boost_1_42, but as I mentioned, it is
> working nicely:
> class Mess
> {
> private:
> friend class boost::serialization::access;
>
> template<class Archive>
> void serialize(Archive & ar, const unsigned int version)
> {
> ar & messageId;
> }
>
> protected:
> int messageId;
>
> public:
> (...)
> };
>
> class CMess : public Mess
> {
> private:
> friend class boost::serialization::access;
>
> template<class Archive>
> void serialize(Archive & ar, const unsigned int version)
> {
> ar & boost::serialization::base_object<Mess>(*this);
> ar & cma;
> }
>
> protected:
> int cma;
> (...)
> };
> (...)
> BOOST_CLASS_EXPORT_GUID(CMess, "CMess")
>
> Thanks for reading it!

Thanks for writing it!

I can't really "solve" the problem without having the whole project
to experiment on. But I can offer some suggestions. This is actually
better as all I have to do is talk rather than doing any actual work. So
here are some ideas to try.

a) Consider recoding your class to avoid inline functions, like this:

cmess.hpp

#include ... // don't include any ?archive classes !!!!

class CMess : public Mess
{
    private:
         friend class boost::serialization::access;

       template<class Archive>
       void serialize(Archive & ar, const unsigned int version);
    protected:
         int cma;
    (...)
};
(...)
BOOST_CLASS_EXPORT_KEY(CMess, "CMess")

cmess.cpp

#include <boost/archive/polymorphic_iarchive.hpp>
#include <boost/archive/polymorphic_oarchive.hpp>

#include "cmess.hpp"

template<class Archive>
void CMess::serialize(Archive & ar, const unsigned int version)
{
    ar & boost::serialization::base_object<Mess>(*this);
    ar & cma;
}

BOOST_CLASS_EXPORT_IMPLEMENT(CMess)

// explicity instantiate code for polymorphic_?archive
template CMess::serialize<boost::archive::polymorphic_iarchive>;
template CMess::serialize<boost::archive::polymorphic_oarchive>;

Now you've got 50 *.cpp files This will entail one time compilation
of each file. Each compilation will be fast.

Now create a library with the 50 modules in it - call this mess.lib.

Now your main app code would look like:

#include <boost/archive/polymorphic_text_oarchive.hpp> // note NOT
polymorphic_oarchive
#include <cmess.hpp>
#include... // another classes used.

int main(int argc, char * argv[]){
    const cmess x;
    std::fstream ofs("filename");
    boost::archive::polymorphic_text_oarchive oa(ofs);
    oa < x;
}

This should compile in no time at all. Now link against your mess.lib
created above. I don't know how much time this will take. I can't
imagine that it should be all that long. But I could be wrong.

Now you've got:

a) an application which only re-compiles stuff that has changed.
b) No code bloat - all functionality is compiled/included ONLY
once.
c) something that works with all archive types - while recompiling
only one module - (main above)
d) The application will be slightly slower for not using inline code.
I doubt this will be measurable.
e) The application will be slightly (though measurably) slower
for using polymorphic_archives.

That should do it for you. If you can do this, I believe you'll
find your application MUCH smaller and MUCH faster to build.

Variations on the above theme.

a) rather than instantiating for polymorphic_?archive, you could
instatiate for all archives you might plan to use. Larger mess.lib
and larger application, but somewhat faster code.

b) rather than creating mess.lib, you could create a
dynamic_mess.dll. This would include all your classes
in the dll so the application would be really small. Linkage would
be almost instanteous. However, this would be trickier
and require more careful organization of code to avoid ODR
violations.

c) You could create several DLLS which included different
selections of functions. Even trickier than b) but doable.

Robert Ramey

 


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net