Boost logo

Boost Users :

From: Robert Ramey (ramey_at_[hidden])
Date: 2008-03-24 13:09:31


François Mauger wrote:
> Hi robert and all,
>
>>>> On Sun, March 23, 2008 10:11 pm, Robert Ramey wrote:
>> if you don't like tracking - you can turn it off for the types you
>> want.
>
> see below.
>
>> to add your own characters between archives try something like this:
>>
>
> this is what I have done of course, to make it run.
> but I was considering that such behavior is the responsability
> of the core lib, not the user (me), that's my point.

A very bad idea in my opinion. Now the user can embed
serialized data whereever he wants. The library needs
to be smaller rather than larger.

> As I use a 'serialization manager' class that implements
> some txt/xml/bin archive depending of the file extension
> provided by the users of my lib, I would find more, say, 'elegant'
> not to treat text archives in a different way considering xml/bin ars.
> but maybe it is a purist issue! and maybe not really relevant.
> of course, this is not a pb if I add this 'blank' char by hand.
> So it is fine for me.

As it will be for all users with their own special requirements.

>
> -- next point:
>
> About memory tracking: after a few investigations while
> considering my needs, I find it very powerful.
> I only implies some care while using it.
> As I use serialization by pointers on nested objects
> in my lib, tracking is very useful for it handle the links
> properly.

> The only problem I met is the following:
> - I have say 1000000 records (some instances of a class) to write
> in the archive.
> - Each record uses massively std::vector or std::list as internal
> members (subrecords with dynamically allocated memory)
> - Each record (with its internal subrecords)
> is more than 1000 bytes long --> so total storage for my data set
> is about 1Gb in a single output file!
> - More, each record has pointers members in it, so tracking is
> activated and I NEED it.
>
> Now, when I write all the records using a kind of loop on my archive,
> I must not reuse the same memory addresses (as explained in the sample
> programs and docs in boost::serialization, I cannot use a temporary
> record instance in the loop).
> So I have to store in the RAM of the machine
> the whole sets of records (and dynamically allocated subrecords)
> at the same time before to store them in the archive (1Gb!).
> Typically I use a std::list for this.
> It implies to run a machine with 1Gb available RAM,
> which cannot be garanted for all my users,
> even on our computing center...
>
> On the other side, during the loop process,
> there is no way to erase (nor reduce the size of)
> previous written records while saving the current one
> in the archive. Of course this would
> save the current running memory but this available memory
> could be 'reused' at some point by the
> tracking mechanism (this is rather a random process that depends on
> the system)
> and then one would experience some misinterpretation of data
> as already serialized stuff through my pointers. Then
> the output will be corrupted.
>
> This is what I call a "long-range memory tracking effect":
>
> - within a single record (short range),
> memory tracking is fine for it enables to
> maintain some connections between 'subrecords' through pointers
> in a very nice and 'storage saving' way.
> More de-serialization works perfectly, without duplicate
> records/subrecords and memory leak issues.
>
> - for successive records (long range),
> I have no need of pointers to make links
> between objects (records and subrecords in it),
> so tracking is unuseful but it is still activated
> in the same program!

> Then it leads to some nesting between memory addresses among
> different records. This is a corruption case, unless, as I explained
> above, all records (and internal subrecords) are kept in memory
> till the end of the serialization process... then I need 1 Gb RAM
> machine!
>
> The only way I have found in my progs to break this long range effect
> is to use one archive per record. It then seems that
> memory tracking is confined within the limit of the current archive
> which is exactly what I want.
>
> Finally, what will be useful to enable my "per-record" serialization
> approach for a large data set, this is a kind of "memory tracking
> reset" function that could be invoked online while looping on the
> archive.
> I have no idea if it is possible to implement this, and if other
> people could find it useful, Robert first!
>
> Hope my point has been understood.
> At least, if you can confirm that tracking mechanism is confined
> within a single archive, I can use this strategy of
> multiple archives per file.
>
> Thanks a lot for your attention, advice and constructive critics.
>
> And many many thanks to Robert
> for this very nice and elegant serialization
> library. This is really a great and useful work!

A couple of points.

a) First, you've got a special situation. This requires special
effort to understand the library in more detail than normal.
Thankfully, you've made the investment in this effort and
have been able to exploit the library to help in addressing
your situation. I recognize that this is not easy.

b) You've given a very clear explanation of your situation,
the reasons for it, the problems in addressing it and what
you've done about it. This is even harder and I appreciate
your efforts.

c) I see your method of making one file => one archive
per instance as being a very practical and viable solution.
I don't see it as overly inefficient given that your class instances
are so large to begin with. In thinking about this, I can see
how a change in the library might help, but I don't think it
would be more efficient than what your doing now.

One idea you might want to look into is the concept of
an "archive helper" which attaches special behavior to an
archive. The current trunk includes archives which have
"archive helper" attached to handle the special requirements
of shared_ptr which does not model the concept of
"serializable". Such a helper could be used to implement
your own custom tracking behavior - (Which is what the
helper for shared_ptr does).

Another idea would be to use BOOST_STRONG_TYPEDEF
to create a "wrapper" around types for which you want
to turn off tracking for certain instances. This technique may or maynot
be useful to your.

Good Job and Good Luck.

Robert Ramey


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net