Boost logo

Boost :

From: troy d. straszheim (troy_at_[hidden])
Date: 2005-03-02 05:47:33


Robert Ramey wrote:

>
> I love hearing this - I always wanted to be associated with particle
> physics. LOL
>

Heh heh... *sigh*. Don't... don't get me started. :)

> Of course you know that is straight forward - and you have he xml archives
> that can be used as examples. If you just want to display the information
> and not load it, a bunch of tags like object id, etc. can be suppressed.
> Personally I would just just the xml_archive and concentrate my efforts on a
> program that displays XML in a convenient and perhaps customizable way. I
> suspect you could find or make a suitable program of that nature for free or
> for low cost. To re-iterate, I would factor the "pretty display" from the
> serialization and make it customizable according to the kind of display
> required.
>
> In fact, if I had nothing else to do, and had that much interest, I would
> make an enhanced version of xml_archive would output TWO files, a) the
> xml_archive and b) an xml_schema which could be used by other programs to
> parse the xml_archive. Just random thoughts.

Sure, that much I've got: suppressing the object id tags, overriding
save() methods for pointers so that they get "skipped", all that, I
guess I'm talking more about factoring out the markup within the
serialization library itself. It would be easy to just copy/paste the
entire xml_ thing, rename the classes and change the tags and so forth,
but of course this would be dispicable nastiness. Better to do
something like an nvp_archive that referred to some kind of formatting
policy class, with xml_archive as nvp_archive<xml_formatting_policy>...
  or something like that. This could also get you the ability to do
SpitAndDuctTapeNVPML, or, say, some kind of binary_nvp_archive,
basically the same as XML but without all the ascii bloat. We would
certainly find this handy, as we never know if we might be forced to
convert to XML at some point, but there's just so much data that we
can't afford the ascii bloat in our storage. But of course you can just
zip the xml stuff, and a binary_nvp_archive is a lot more work than just
factoring tags and indentation out of xml_archive... OK, I've gone off
on a tangent. Never mind. And your points about where to focus the
effort are well taken.

Anyway, the purpose isn't visualization after the program has run, it is
more like

    pretty(log_stream) << my_particle;

in the code itself. I'm catering to printf-style debugging. This is
what the Beakers (jargon for "Physicists"... think Dr. Benson Honeydew
and his assistant...) like to do, and since I'm ripping out the old
serialization method (from the root analysis toolkit, which involves
running your headers through a quasi-compiler which generates
serialization functions, and which pukes the moment it sees anything in
namespace boost...) and since the beakers will react violently to this
at first, it would be good to toss them a bone as well, like "you get
ToStream(ostream&) for free". Now that I see memoization_archive, I see
I can give them something else, too....

But to focus on reformatting as you suggest, I could concievably do it
with some kind of xml-reformatting stream. Make a convenience function
that wraps the insertion into xml_oarchive(stringstream) and the pass
through the tag-removing reformatter. Would be a good opportunity to
play with iostreams. Yeah, sounds good. OK.

> I'm not sure I'm convinced of this.
>
> I recommend the following when you make a new archive
>
> a) run the code module for the new archive through Gimple LINT and fixup the
> obvious oversights.
> b) make a file similar to text_archive.hpp in the test directory for your
> new archive - new_archive.hpp
> c) modify the Jamfile in the serialization test directory to include your
> new archive archive
> d) invoke the batch/script file run_archive_test <compiler>
> <new_archive.hpp>
>
> This will run all the serialization tests against your new archive. It
> takes a while - but its worth it.

Sure. I did this with variant, it works great.

> I recommend the following when you make a new serializable class.
>
> a) run the code module for the new serializable class through Gimple LINT
> and fixup the obvious oversights.
> b) using the other tests as a basis, make a new test for your new
> serializable class.
> c) in the course of this you may have to make additions to your new class
> such as operator= or you might not. Perhaps, adding a global
> operator=(const T lhs &, const T &rhs) might be added just to the test.
> d) add test for your new class to the Jamfile in serialization/test
> e) invoke batch/shell script runtest <compiler> to generate a table of all
> tests including your new one. These tests will run your new class against
> all currently defined archives. This is important as some archives are not
> sensitive to some errors. For example, tagged XML can recover from some
> errors whereas the more efficient native binary cannot.

I've already learned from experience to have the testsuites run on all
archive types automatically, if for no other reason than to catch places
where you've forgotten to use make_nvp(). I'm with you.

The random_iarchive is intended as a tool to be used in this process:
for instance, I won't sleep well until I have seen a terabyte's worth of
events get serialized in one run.... The tests have to be *big*,
stressful, lots of data.

> Even if you only use just one particular compiler for the application you
> ship, I would recommend building and running all tests on at least two
> pretty good different compilers. For example, gcc 3.4? and VC 7.1 is a good
> combination. This will often uncover subtle ambiguities that would
> otherwise linger on for years inflicting programmer pain.
>
> I have to say the one single most important thing I've learned from boost is
> that its cheaper to maintain the test suite and build for several compilers
> than it is to debug the application. bjam (which DOES drive me crazy) is a
> godsend for doing this kind of thing.

Sure, you don't have to convince me of this. There's nothing more
beautiful than a rigorous set of test suites. I'm a crusty UNIX guy
with abysmal debugger skills, I'm dependent on them.

We have a similar testing infrastructure that I've thrown together...
We're a "make" shop... I wasn't sold on bjam. And running classes
through all archive types, automatically, is obviously the only way to
do it: I put together a few macros to accomplish this in code rather
than in a bunch of build-system mechanics. One macro creates tests for
one class through all archives. Not sure if they would integrate with
Boost.Test so easily, though, and Boost.Test is surely more robust in
various ways in case of failure. I can post 'em if you're curious.

>>Another problem is that if a class contains vector<shared_ptr<Base> >,
>>you'd like to be able to populate this with shared_ptr<Derived>,
>>where Derived is randomly selected from the set of classes that
>>inherit from
>>Base. Since serialization requires these classes to be registered, it
>>seemed to me there might be a way to do this. But maybe its all
>>overkill.
>
>
> If you don't find the above sufficient, then its not overkill. As I said
> the pain of writing the test is nothing compared to shipping a product with
> a bug.
>

I was wondering how to accomplish it. I am in, say,

template <typename T> void
random_iarchive::load_override(vector<shared_ptr<T> >), with T = Base.

My random_iarchive has had Base and several types Derived registered
with it already. Because I know what Base is (from T), I can easily
populate the vector with shared_ptr<Base>, but in order to populate it
with Base and a variety of classes Derived, I have to somehow ask the
archive what possibilites are registered and choose one... Forgive me
if I'm way off base. The whole business of type registration in the
archives is still pretty opaque to me, and my gut says that this is
either impossible or overkill.

>
>>Anyhow, this random_iarchive exists (except for the Base/Derived
>>thing, above), maybe it would make a good tutorial case for custom
>>serialization archives, maybe people want to use it for something.
>>I'd be more than glad to write up some tutorial material, I'm sure I'd
>>get a lot out of it.
>
>
> As I said, I'm not convinced that the random test data should be part of the
> archive class. But I'm certainly pleased that someone finds the
> serialization suffiiciently useful and interesting to do stuff like this.

I have also created a root_oarchive which creates root "trees", in case
anybody is working with the ROOT analysis toolkit. The way one does
this "normally" is a real nightmare, and being able to wrap all that in
operator<< is a huge, huge win for cleanliness and maintainability.
Testament to the flexibility of the serialization library. One big
thing here is that the serialization library allows you to "flatten"
nested structures into tuples by keeping track of the nvp paths in a
deque inside the oarchive. Kind of like xml output, but without
start/end tags, and where each nvp has all of its parents prepended to
it separated by some path separator character. One could concievably
create an iarchive for these things as well, I haven't bothered.

> So if you want to polish this up and add it to the Files section on source
> forge I think it would be great.

So the attempt is to factor out the business of populating classes with
random test data into an iarchive class, in an effort to thoroughly test
  the "real" archive classes, and so that as a user with a bunch of
serializable classes, I can fill them up with random stuff and serialize
them through all the various archive types them until my CPU smokes,
without writing fill_with_random_data() routines by hand for every one
of them.

Actually, now that you mention the memoization_archive, it would
actually be ideal if there were an archive that could do a deep
*comparison*, thus eliminating the need to write all those
operator==()s. I had thought about this and deemed it impossible, but
if you're talking about deep copy.... Then you've got a real
full-of-data workout canned in a function for an arbitrary serializable
user class:

(for each A in xml, text, binary)
MyHugeClass src, dst;
random_iarchive >> src; // src now swollen with data
A_oarchive oa(somewhere) << src;
A_iarchive ia(somewhere) >> dst;
comparison_archive ca(src) << dst; // or however that looks

 From your serialization(archive) method, you get xml/txt/binary i/o,
comparison and copy.

> e) memoization_archive - an archive adaptor which does a deep copy using the
> serialize templates. This also requires some extra help from
> extended_type_info.

This is big to us. I'll contact you...

Thanks again,

-t


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk