Boost logo

Boost :

Subject: Re: [boost] [serialization] deserializing asynchronously serializedtypes
From: Robert Ramey (ramey_at_[hidden])
Date: 2011-09-04 02:32:59


Andrew Hundt wrote:
> I have an unusual use case for boost.serialization, and I was
> wondering if
> it would be possible to adapt it to my needs:
>
> - I have a set of over 100 types, and instances of each are generated
> asynchronously then serialized to a file in that order.
> - The most interesting serialized data will be written just before
> the power is unexpectedly cut.
> - I need to load in and run on as much data as possible when reading
> the serialized data back, ignoring incomplete data at the end (due to
> a power cut).
> - The basic Boost serialization examples require you to know the type
> of the next piece of data to be loaded when reading. Since these
> types are
> generated asynchronously they are not known in advance.
> - I need to write the data out immediately when it arrives because of
> the power issue.
> - Files will be getting up to around 150GB in size for binary
> archives, so
> it can't be marshaled in memory, it needs to be written immediately
> even if it is redundant.
>
> Is there a way to read in that serialized file using the facilities
> provided in boost.serialization?
>
> Here are my current ideas:
> - I tried using boost.variant, but it loses its will to compile when I
> increase the typelist limit to around ~60 types on gcc 4.4, and I
> have more than a hundred.
> - Use preprocessor metaprogramming to do something equivalent to
> boost.variant, but I would very much prefer a more pleasant option.
> - Serialize an index or custom headers indicating the next type to
> appear
> - One way of achieving some of these goals is writing one piece at a
> time using a binary archive to an fstream, with the index mentioned
> above separating data types.
>
> I don't know what aspects of my requirements will prove to be a
> problem, so if anyone can provide advice that would help me avoid a
> major pitfall, it would be greatly appreciated.
>
> Thanks for your thoughts.
>
> Cheers!
> Andrew Hundt

Andrew Hundt wrote:
> I have an unusual use case for boost.serialization, and I was
> wondering if
> it would be possible to adapt it to my needs:
>
> - I have a set of over 100 types, and instances of each are generated
> asynchronously then serialized to a file in that order.
> - The most interesting serialized data will be written just before
> the power is unexpectedly cut.
> - I need to load in and run on as much data as possible when reading
> the serialized data back, ignoring incomplete data at the end (due to
> a power cut).
> - The basic Boost serialization examples require you to know the type
> of the next piece of data to be loaded when reading. Since these
> types are
> generated asynchronously they are not known in advance.
> - I need to write the data out immediately when it arrives because of
> the power issue.
> - Files will be getting up to around 150GB in size for binary
> archives, so
> it can't be marshaled in memory, it needs to be written immediately
> even if it is redundant.
>
> Is there a way to read in that serialized file using the facilities
> provided in boost.serialization?
>
> Here are my current ideas:
> - I tried using boost.variant, but it loses its will to compile when I
> increase the typelist limit to around ~60 types on gcc 4.4, and I
> have more than a hundred.
> - Use preprocessor metaprogramming to do something equivalent to
> boost.variant, but I would very much prefer a more pleasant option.
> - Serialize an index or custom headers indicating the next type to
> appear
> - One way of achieving some of these goals is writing one piece at a
> time using a binary archive to an fstream, with the index mentioned
> above separating data types.
>
> I don't know what aspects of my requirements will prove to be a
> problem, so if anyone can provide advice that would help me avoid a
> major pitfall, it would be greatly appreciated.
>
> Thanks for your thoughts.

You could still make your own "special purpose variant. Look at the
section "serialization wrappers".

struct class my_wrapper {
    unsigned m_i;
    union {
        type1 &m_t1
        type2 &m_t2
        ....
    };
    my_wrapper(type1 t1) : m_i(1), m_t1(t1) {}
    my_wrapper(type1 t2) : m_i(2), m_t2(t2) {}
    ...
}

template<class Archive>
void save(Archive &ar, const my_wrapper & w, unsigned int version){
    ar << w.i
    switch(w.i){
    case 1:
        ar << w.t1;
        break
    ....
}

template<class Archive>
void load(Archive &ar, my_wrapper & w, unsigned int version){
    ar >> w.i
    switch(w.i){
    case 1{
        ar >> w.t1
    ....
    }
}

so now you could just use

    ar << my_wrapper(t); // where t is any one of 100 types

This is basically a poor man's variant which doesn't
use compile time coding.

Another idea - tricker would be to use a variant of variants
to get around the compiler limitations.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk