Boost logo

Boost :

Subject: Re: [boost] [modularization] Extract xml_archive from serialization
From: Robert Ramey (ramey_at_[hidden])
Date: 2014-09-17 12:36:12


Stephen Kelly-2 wrote
> The graph is showing public module dependencies. I think that's
> understood.

Not by me. There definition of "module dependency" is unclear to me. I
presume
it's defined by the situation where to build one thing, one has to build
other things.
So if you start out with thing "A" then implies build/inclusion of some
stuff from
other libraries, and so on inductively until one defines a closed set. I
could buy
this. But the problem is when thing A is a module. Does building A refer
to
building the library, running tests, building the examples, building one
app.
Clearly if I'm building something which includes test_archive I have a
different set of dependent "modules" than if I'm building something that
includes xml_archive.

I'm questioning the whole concept of "module dependency". To me it's
ill defined and actually not definable outside of a more specific context.
Hence it can't be used to determine how a large body of code should
be (re) factored. We need something more precise - which has yet to
be articulated.

>> Consider another simple case - date time/serialization.hpp
>>
>> most date/time users don't use this - but a few do. Is serialization a
>> prerequisite for date/time? which users are we talking about? One can't
>> win here. If you distribute serialization with every use of date/time
>> you're distributing too much. If you don't, you'll be failing to ship
>> functionality which some users need. What is the solution here - make
>> two
>> libraries out of date/time? or what?
>
> The solution is to make serialization low-cost to depend on, so that
> depending on it is not a problem. That is exactly what I am recommending.
> The current problem with serialization is that it is expensive in terms of
> needless dependencies. My recommendation does a lot to solve that for
> serialization.

I'm reluctant to propose specific courses of action too soon. It's almost
for sure I will get it at least partially wrong. I'm going to over come
my reluctance to address this specific case as an example and see
where it goes.

A user includes the date-time library in his code.
He is dependent on a of boost headers which don't include
boost serialization. He can build his app without including
and/or linking serialization. He's happy about this. But he
has to install the whole serialization module which under
current rules means he installs spirit, and a whole lot
of other stuff. He's unhappy about this. Damn! the date-time
library refers to serialization even though I don't use it and it
means that my date-time DLL is a lot larger than it has to be.
This is really annoying to me. Also when someone mucks
with the serialization library it might keep serialization from
building which might keep my app from building even though
I don't use even one line of code from it!!!! Very annoying.

Since most of the problem is xml_archive->spirit - we can
"fix" this by moving the xml_archive to ?. This will "solve"
the problem above. Of course this comes a the expensive
of everyone who wants to ship serialization with support
for all of the archives classes in the package. They will now
have to link with some other module other than serialization
which is pretty non-obvious.

So the net improvement in utility of boost libraries is not
likely to be positive.

The "correct" solution to the above is for date-time to build
two modules: date-time and date-time-serialization. Now
the original app user above has only what he wants and is
not dependent upon boost serialization. Yet other users
of the serialization library have what they want - serialization
all in one place.

To summarize - the right thing to extract is the serialization
of date-time to a separate module. This kind of module has been
referred to as a "helper module" (or something like that I don't
remember). It's place in a "module dependency" graph is unclear.

This means that the author of date-time has to refactor somewhat
to create two modules. Add support for auto-linking and this is
not quite as easy as it would first appear.

I know this as I have addressed this within the serialization library
itself.
I did not want users to have to import the whole wide character code
when they weren't going to need it. Hence I create serialization.dll for
all the common code and wserialization.dll which includes code specific
to wide character functionality. wserialization.dll calls into
serialization.dll
for core functionality.

So we have the case where applications which don't use wide character
functionality don't have to pay for it. And those that do get this
functionality
without having to do anything special - auto-link is fully implemented.

Note that this refactoring/modularity is not at all visible in the "module
dependency" graph. Never the less, I think this approach and result are
consistent with your goal of "minimizing dependencies" (don't forget I
don't think this phrase is well defined).

At this point there would be a couple of things that would be possible.

a) require/encourage authors of "library helper" (bad term!) modules to
build them as separate DLLS/LIBS.

b) divide the serialization (again) so that rather than wserialization and
serialization
it would be four modules serialization, serialization_with_xml_archive,
wserialization
wserialization_with_xml_archive. And of course don't forget to support
auto-link.

Note that while either of these options would address the "problem" faced
by the user(s) above, The current "module dependency" graph would be the
same
in all cases. That is, this graph cannot be used to distinguish those cases
where a problem exists and where it doesn't. The graph in interesting, but
can't be used to make any real decisions.

Also not neither of these options would require any changes to git module
organization. Only Boost Build scripts and module source code would
change. So it's my view that the current focus "Modularization" is somewhat
misguided. It needs to be considered in terms of what boost policy
should be toward importing other boost modules, granularity of modules,
implementation of auto-linking - things like that. And deciding these
things
will take a level of consideration and effort that we haven't yet been able
to muster. Perhaps your advocacy will provide the necessary sense of
urgency to do this.

>> So the graph tells us something, but what?
>
> Module/package dependencies.
>
>> So - the degree of "modularization" cannot be determined or illustrated
>> or
>> measured by examining the graph above.
>
> Disputed.

LOL - and what does that mean? Of course this is the source of our
disagreement. To you it seems clear what it means, to me its undefined.
It will take a while to reconcile this.

>> So, taken to it's logical conclusion, extracting xml_archive would lead
>> to
>> extracting other components as well.
>
> Nope. No one has suggested that. Extracting xml_archive isolates the
> spirit
> dependency. There is no similar motivation to extract other parts. I
> looked
> a little bit into splitting all of the archive parts away from the
> serialization part, but that still ties all the rest of the archive parts
> needlessly to spirit.
>
> What I recommend isolates the cost of spirit to the code that uses it.
>
> There could be reason to try to split the rest of the archive stuff from
> serialization, but I didn't look into that, so I'm not recommending it.

I think the problem is more fundamental that just moving around a few
libraries/sublibraries. To me the current "problem" is an incidental
side effect of the lack of implementation of certain policies that we
have failed to define. So this "piece meal" approach will lead to
unnecessary complexity and not really fix much. If we keep going down
this road there will always be something to (re)factor.

>> But the real questions are:
>> a) what do we want modularization to accomplish and is this a feasible
>> goal.
>
> This is where you are providing a lot of bad stop-energy. Were not these
> questions answered years ago?
>
> Tell me this: Why did boost migrate away from svn to 100 fractured (not
> modularized!) git repos?
>
>> c) Do we want to support deployment of boost subset? I think we do.
>
> This question was answered years ago.
>
> Why did boost migrate away from svn to 100 fractured (not modularized!)
> git
> repos?
>
>> My basic point is that these questions have to be addressed before the
>> notion of decoupling can be carried much further.
>
> Insisting that they are not already answered is not helpful.

Oh no !!!. The reason we're having this problem is that we're never really
thought about it. Before modularized Boost, there wasn't much we could
do about it. Now we're looking at using modularized Boost to permit
Boost to be made a lot bigger, this in turn raises the issue of deployment
subsets
and and for the first time we're starting look seriously at this. Up until
now it was just an occasional grumbling.

You're suggesting I'm against doing anything. That's not true. I'm against
doing the wrong thing. These are not the same.

You're also suggesting that I don't think there is a problem. That's also
not
true. But I don't buy the argument "something needs to be done, this is
something, therefore we must do this".

>> b) created as a separate library module
>
> This is the proposal.

I'm still not quite getting what you mean by creating a separate module.

Do you mean something similar to what I mentioned above as
serialization_xml_archive...
This wouldn't effect the "module dependency" graph but it would might
accidentally address the
"subset deployment" issue.

Do you mean creating a separate module at the git level? This would make
the
"module dependency" graph look more like what I think you want it to look
like. But
I'm convinced it would actually address the issue of users importing code
that they
don't actually use - I'd have to think about this.

Or do you mean something else entirely?

My real point is that I believe it's pre-mature to start investing in
"minimizing module
dependencies" before really considering what it is we want to achieve and
the alternatives
for achieving it. I believe my arguments supports the proposition that this
is not
an unreasonable request.

Robert Ramey

--
View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-serialization-tp4667615p4667669.html
Sent from the Boost - Dev mailing list archive at Nabble.com.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk