Boost logo

Boost Users :

From: Bill Lear (rael_at_[hidden])
Date: 2007-03-09 14:45:40


On Friday, March 9, 2007 at 10:53:20 (-0800) Robert Ramey writes:
>...
>Define your own serializaton for std::string and use it instead
>of the one in the serialization library. This is probably a bad
>idea as it would attribute your special behavior to a standard
>object and would make your archives and programs non
>portable and harder to support if you want to ask us for help.

Definite downsides, true, but I'm not sure that it would be
non-portable, except perhaps that I have a different idea of "define
your own serialization for std::string". I have done what I
considered to be this, and posted it below.

>Define you're own string class derived from std::string. This
>string class could be serialized using your own special sauce
>without losing portablity. The could be formlated as
>a "serialization wrapper" as described in the manual so that
>you're code would only have to use this "special string"
>in the process of serialization and not through out your program.
>Look in the recent document and the "is_wrapper" typetrait
>for more information.

Ok, I'll have a look at that --- sounds like a reasonable alternative
to what I've done.

>So now the problem boils down to how your going to capture
>and restore the fact that these strings share underlying data.
>At first one would think that just letting your wrapper class
>use the default tracking behavior eliminate duplicates would
>solve your problem. But I don't think so. As I said above,
>I don't think that you're serializing the SAME (see above)
>string one million times. I think you're serializing a million
>different strings which happen to contain the same data.
>
>It seems to me that you'll have to delve into the implementation
>of the string class you're using and gain access to the internals
>of the implementation and figure out how to capture the
>reference to the shared contents and serialize that.

The strings share data on assign, so:

string a = "foo";
string b = a;

means they share the underlying memory "foo", with a logical refcount
of 2 (the physical refcount, for implementation reasons, is actually
1). Once you muck with a or b, they get their own copy of the memory,
decremented ref count, etc. If I serialize a and b, and deserialize,
the load will "break" this ref count --- I get two "unshared" strings,
each with a block of memory "foo". Not the fault of the serialization
library, of course ...

So, here is how I've coded this to test it out. The test I've just
completed shows that the memory bloat is completely removed --- this
is a major relief, as the bloat was literally expanding by 3-4
gigabytes a process that was already near our VM limit. Think of this
as just a proof-of-concept, if you like
(boost/archive/impl/text_iarchive_impl.ipp):

#ifdef LL_STRING_DESERIALIZATION_CACHE
typedef std::map<std::string, bool> ll_cache;
static std::map<std::string, bool> ll_string_cache;

void nuke_ll_string_cache() {
    ll_string_cache.clear();
}
#endif

template<class Archive>
BOOST_ARCHIVE_DECL(void)
text_iarchive_impl<Archive>::load(std::string &s)
{
#ifndef LL_STRING_DESERIALIZATION_CACHE
    std::size_t size;
    * this->This() >> size;
    // skip separating space
    is.get();
    // borland de-allocator fixup
    #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101))
    if(NULL != s.data())
    #endif
        s.resize(size);
    is.read(const_cast<char *>(s.data()), size);
#else
    std::size_t size;
    * this->This() >> size;
    // skip separating space
    is.get();
    std::string input_string;
    input_string.resize(size);
    is.read(const_cast<char *>(input_string.data()), size);

    ll_cache::iterator i = ll_string_cache.find(input_string);

    if (i == ll_string_cache.end()) {
        std::pair<ll_cache::iterator, bool> x =
            ll_string_cache.insert(make_pair(input_string, true));
        i = x.first;
    }

    s = i->first;
#endif
}

If you have thoughts on how to make this cleaner, without actually
hacking into the actual boost implementation details, that would be
great (if this is what you already suggested about a wrapper, just
say so).

Thanks for the help.

Bill


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net