Boost :

Date view	Thread view	Subject view	Author view

From: David Abrahams (dave_at_[hidden])
Date: 2005-11-18 20:19:26

Next message: Aaron Windsor: "Re: [boost] VC++ compiler error with boost graph library"
Previous message: Jan Stetka: "Re: [boost] c++/cli boost subproject?"
In reply to: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Next in thread: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Reply: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"

"Robert Ramey" <ramey_at_[hidden]> writes:

> David Abrahams wrote:
>> "Robert Ramey" <ramey_at_[hidden]> writes:
>
>> ,----
>>> For many archive formats and common datatypes there exist APIs that
>>> can quickly read or write contiguous sequences of those types all at
>>> once (**). Reading or writing such a sequence by separately reading
>>> or writing each element (as the serialization library currently
>>> does) can be an order of magnitude more expensive.
>> `----
>
> I have no problem with the above.
>
>> We want to be able to capitalize on the existence of those APIs, and
>> to do that we need a "hook" that will be used whenever a contiguous
>> sequence is going to be (de)serialized. No such hook exists in
>> Boost.Serialization.
>
> Whether or not such a hook is necessary is the crux of the issue.

Yes. Or more precisely, whether the consequences of not having the
hook in the serialization library itself are bad enough to warrant
creating it there. I will discuss those consequences after I present
our new design, which adds the hook, but only in our own extensions --
essentially a library built on top of the current serialization
library without modifying it.

> I consider the submission a use case for archive creation and/or
> extension.

I don't understand what you're trying to say. I presume by "the
submission" you mean Matthias' proposed changes to your library. But
I don't understand what you mean about it being a "use case."

> As far as I could tell, that particular one didn't require
> any new hooks in the library.

Functionally speaking, that is correct. You /can/ do fast
serialization of contiguous arrays without changing the library. You
don't even have to write a whole new serialization library.

> Maybe the next iteration will be different - but that's how I see it
> now.

There are some negative consequences of creating the hooks outside
Boost.Serialization. Once you understand them, I'm pretty sure you
will think they are significant. Whether they will be significant
enough to induce you to make changes in Boost.Serialization is of
course an open question.

> Let me explain one place where our difference lies.

Having read everything that follows, I don't see any explanation of a
"place where our difference lies." The parts I understand (most of
it) sound like "motherhood and apple pie" -- good, common sense that's
hard to disagree with. Is it a thought that was never finished?
Would you care to try to put it more succinctly?

> The serialization library is basically three pieces
>
> a) serialization specifications for each data type to be serialized.
> (serialize functions) which are independent of the archive. That
> is these specifications depend only upon the requirements of
> the Saving Archive or Loading Archive concepts.
>
> b) archive classes which implement the Archive concept for
> different file formats. These archive classes have common
> implementation features factored out into common modules.
> Due to "practical" considerations like whether something should be
> pre-compiled in the library, whether it is dependent on a use's
> application type, minimzation of code bloat etc, This common
> implemnetation code might be included in one of the base classes or
> in the file i/oserializer.hpp. (The code in i/o serializer.hpp)
> would normally be one of the base classes but I believe
> that template meta programming consideratons related
> to less-conforming compilers). These "common code"
> modules are designed to hold code applicable to all
> archives.
>
> c) Finally, the escape hatch. Those serialization implementations
> which have to be dependent on the combinaron of archive
> type and datatype. The most obvious case is name-value
> pairs - nvp. nvp has its own default serialization which
> just serializes the value part. Withinxml archives this is overriden
> with a special version for that archive type. This is the model
> which I have always envisioned that the library be extended.
> It is only in this way the the library can be extended without
> being complicated geometircally as time goes on.
>
> I realize that this design and more importantly, it's motivation,
> might not be all that apparent from the the documentation
> on archive implementation. Sorry about that.

No, it's perfectly clear what you're trying to do once you study the
library implementation. Your design philosophy makes good sense
AFAICT.

I am a bit surprised to hear you state flatly that there is only one
way to extend the library that can ever work. How can you possibly
know you've considered every possibility? I don't have the same
confidence, even about problems I've studied for years.

> As time goes on I would hope that this can be improved. But maybe
> this explains my reluctance to maintain parts of the library beyond
> the reach of those making other archives.

Other archives? Beyond reach? I don't understand what you're saying
here.

> This forms my main objection to the proposal.

Sorry, I don't have any clue what you are referring to. Regardless,
we are going to start from new code that doesn't change any part of
Boost.Serialization, so if possible, it might be better to try to
forget about what you've seen before.

> Of course I have/had lots of other objections to it and
> probably would have a lot more if I spent more time
> looking into it.

Fortunately, you won't have to. We're going to present new code.

> I suspect that the job of making a protable binary archive is much
> harder than it first appears.

Actually it's almost trivial (I did it over 10 years ago), but I don't
know what that has to do with what we're trying to accomplish.

> Making it so that it can exploit opportuninties to be much faster
> while still being as "monkey - proof" is even harder still.

The speedups we're proposing don't have anything in particular to do
with portable binary archives.

> I didn't pursue this as I really don't want to discourage these
> kinds of efforts and they are (or should be) orhogonal to the
> library as it is currently implemented.. If they can be implemented
> without altering the core - then I have no problem. If someone
> believes that modifying the core is unavoidable, then either he or I
> have made some sort of mistake and it will have to be resolved.

It's not unavoidable; as I've said before, it just has consequences
that we don't like, and we think you probably won't like either. If
you can hang on until we've presented what we think is the best design
that avoids altering the core, then we can look at the consequences.
Once you understand them, if you still don't want to make any changes
and you're willing to accept the consequences, we're not going to
press the issue any further.

> If they don't reallly have to alter the core, but the archive auther
> thinks it would make his job easier - then we have a probem.

Let me be very clear about this, at least:

  ,----
  | Ease of archive implementation is unrelated to the motivation for
  | requesting core changes.
  `----

I hope that allays at least one of your concerns.

> I get a suggestion about once a month to modify the core of he
> library for this or that reason. Aside from bugs, it usually boils
> down to the suggestor looking at the code and seeing - "Oh I could
> fix this right there!" without considering all the repercussions and
> without considering the alternatives. (As you might guess, this is
> what I believe happened in this case).

Actually Matthias' considerations went much deeper than you give him
credit for. In my opinion, he just failed to communicate his
rationale properly, and since the details of his code seemed to you to
violate basic principles of your design, I'm sure it was all the more
difficult for you to understand the problems he is trying to avoid.
Working from new code that (I hope!) won't cause you any alarm, it
might be easier to understand the rationale.

> Another common occurence is the attempt to use the serialization
> system to accomplish some end for which it is not suited. A typical
> idea is to use it to implement some externally defined file format.
> I know I drag my feet, I know it drives people crazy, but I truely
> believe that the success of the library is due in no small part to
> my reluctance to add in any more than is absolutly necessary.

Understood. It might be a good idea for you to clearly define the
intended scope of the library. What criteria distinguish an
appropriate application from an inappropriate one? I'm interested in
hearing your intention as the library author, rather than something
like "an appropriate application is one that works well with the
library as it is currently specified and/or implemented." Depending
on your answer, we might indeed be barking up the wrong tree.

> So, I look forward to seeing progress on the following:
>
> a) better handling of special optimization opportunites
> which obtain for certain combinations of data-types and archives.
> Hopefully, an elegantl implementation will serve as a model
> for other people's pet addiitions.

I hope we'll be able to show you something elegant very soon.

> b) A protable binary implementation suitable for
> such things as MPI messages.

Portable binary archives and MPI have little relationship to one
another. You don't flatten your data into a portable format, ship it
in an MPI message that is just a sequence of bytes, and then
deserialize. MPI handles portability internally.

> I also expect these to take some time and hope they
> can be subjected to the boost "process" of public
> criticism and refinement. This will take more time
> but result in a better product. Hopefully, it will
> be less stressful as well - though I doubt it.
>
> I really am trying to wind down my involvement in the
> serialization library.

That's a bit alarming, actually. Have you got someone else lined up
to maintain it? It's important to us and to many others that the
library has a future. Without the involvement of the original author,
that would be in doubt.

> I do want to spend some more time
> on execution profiling and performance tweaks.
>
> I would like to see the documentation improved on how
> to do things like you and matthias are attempting to do.
> The current documenation does have a section
> titled "case studies" which seems to me handy place
> to put examples of this nature and at the same time
> show users how to exploit any "add-in" functionality.
>
> Good luck on this

Thanks.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Next message: Aaron Windsor: "Re: [boost] VC++ compiler error with boost graph library"
Previous message: Jan Stetka: "Re: [boost] c++/cli boost subproject?"
In reply to: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Next in thread: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Reply: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk