Subject: Re: [boost] [modularization] What is a module? What is a sub-module?
From: BjÃ¸rn Roald (bjorn_at_[hidden])
Date: 2014-09-22 17:20:10
On 09/22/2014 11:37 AM, Rob Stewart wrote:
> On September 21, 2014 11:12:51 AM EDT, "BjÃ¸rn Roald" <bjorn_at_[hidden]> wrote:
>> On 09/21/2014 10:36 AM, Vicente J. Botet Escriba wrote:
>>> After the long threads concerning the modularization it seems clear
>>> to me that we are in an impasse.
>> Maybe most of the friction is more of a case of lack of clear
>> communication rather than real disagreements. It could be the goals
>> would be agreed if they where clear to everyone. Some participants in
>> the threads seems to have clear goals in mind for what need to be done
>> first, and just feel need to to proceed, while others are confused
>> what is going on and why. The latter may need to understand the "why"
>> in how we get to a end result we want and what that result looks like.
>> The former group may be more concerned with what they "know" has to be
>> done before we get anywhere. They need to convince the skeptics why
>> that is the case. Neither side's statements and arguments are hard to
>> understand if you are willing to try to shift mindset for the sake of
>> understanding. Nevertheless it need to be some level of consensus
>> before this can proceed.
> You are very likely correct.
>> So how can consensus be achieved? I think starting with more concrete
>> meaning to terminology used in discussions, proposals and guidelines
>> would be a very helpful. Guessing what people mean with module,
>> sub-module, library, sub-library, repo, sub-repo, package, dependency,
>> etc. is not helpful to understanding each other.
>> A library is a collection of code in Boost that is reviewed and
>> accepted/rejected by boost as community. A library is maintained be
>> individuals that are the library maintainers. The code is managed in a
>> separate git repository that is included as a git submodule in the libs
>> folder of the boost master repository. A library contain the library's
>> main module in subdirectories include, src, test, build, and doc. In
>> addition a library may contain a number of additional directories
>> containing optional modules that depend on the main module, these are
>> called sub-libraries.
> You've defined "library" in terms of "module" and "sub-library" which
> have not yet been defined.
Right, module should most likely be defined first as its definition
depend less, if at all on the library definition.
> What is a "main module"?
For library A, the main module live in
Each sub-library contain a module as well, sub.library A/x live in:
all these modules are modules of library A, but the main module is a
sort of focus point. It is the boost library's primary features. Sub
libraries are there to provide optional utilities that depend on or or
create a bridge to other modules, boost or external modules.
Sub-libraries could be used for other purposes than modularization, e.g.
logical partitioning of a libraries facilities. But if that is useful,
it is off-topic, so I leave that.
> I need to understand that to understand what's included in a library.
> More on module's definition below.
>> A library may contain related code in sub-libraries that should be
>> treated as separate module to limit dependencies incurred if they are
>> part of the library's main module. The sub-library has its own module
>> structure containing its own include, src, test, build, and doc
>> directories. A sub-library is part of the library and is maintained by
>> the libraries maintainers.
> I need to understand "module" to understand "sublibrary"
> (which needn't be hyphenated, BTW).
OK, - actually I am struggling with the temptation of using submodule
rather than sublibrary as term here as it really is more logical to me.
Then you get the "main module" v.s. the "submodule(s)" inside a
library. But I try to avoid using submodule due to the danger of mixup
with the git thing with the same name. One option would simply be to
call both the main module and the sublibrary simply for "modules". No
main v.s. sub relationship implied. If there are more than one module
in the library we require that they live in separate subdirectories or
levels in the directory tree.
>> Unit of deployment of boost source code and/or pre-build libraries,
> I assume you meant "pre-built" rather than "pre-build" here.
>> documentation etc. Typically there may be a one-to-one relationship
>> between packages and modules, but it is possible to deploy more than
>> module in a package or break one module into more than one package.
> The current packaging model puts all modules into one package, so it's
> more than possible, it's the norm.
>> A version controlled directory structure containing checked out or
>> modified files in a working directory and a database of the repository
>> history and relationships to other repositories. In a git working
>> directory, the database is in the .git subdirectory or is pointed to by
>> a .git file.
> The usual meaning of "repository", at least in my experience is the
> managed history in a certain control tool, not the files in a workspace.
Well, yes and no... in git what you are referring to is a "bare
repository". But it is not important to me. We could call a repository
with a working directory for "dressed up" -- just kidding. I just think
most developers will think of the working directory when they clone or
update their repository, so that is why I put it the way I did. If we
include this in a normative definition we should try to be precise. The
simplest way is to leave these details out if they do not add anything
to the subject at hand.
>> I suggest we do not use this term mean sub-library. Use the term
>> sub-library or git submodule instead.
> If the VCS ever changes again, the tool-specific name of this
> entity will probably change. It would be better to provide an
> abstraction. That is, formalize "subrepository" and not that a
> git submodule is a subrepository.
Good point. But, my take here was that we do not need the term
sub-repository, hence I don't really see the need for an abstraction
either. If the discussion is about VCS, we have git repository and git
submodule. If the discussion is about source code structure and
organization we have libraries and modules. As stated above, maybe
sublibrary is not needed, we can simply use module.
>> A organized set of boost library code that can be handled in a uniform
>> manner by boost tools. A module shall contain the include, test,
>> and doc directory, Modules that are not header-only shall also contain
>> the src directory that is used to build one or more corresponding
>> library files.
> How is a module distinct from a library?
A library can have more than one module. If it has one it is more or
less the same.
> Both are defined in terms of the directories they contain. Each is
> defined in terms of the other.
Module take 2:
A organized set of boost code that can be handled in a uniform
manner by boost tools. A module shall contain the include, test,
build, and doc directory, Modules that are not header-only shall also
contain the src directory that contain sources used to build static and
dynamic library files that the user will link with.
>> I suggest we do not use this term to mean sub-library, use sub-library
>> instead. If it is not clearly given by context, use git submodule if
>> have a git repository tracked using a git submodule in mind
> Until I better understand the difference between "library" and "module",
> I can't say whether I agree with your conclusion on submodule.
Hopefully some of this is clearer now.
>> Handling of dependencies is where I struggle the most with seeing a
>> clear path forward. In particular what determines the nodes and edges
>> in the dependency graphs we care about. And what are we going to use
>> the dependency graph for.
>> Test Example, and Doc Dependencies:
>> First of all, if test, example and doc code is part of the module and
>> incur additional requirements, we certainly do not always want to track
>> those dependencies as the modules dependencies. A separate dependency
>> graph node for test code seems to be a solution if there is a real need
>> to track it at all. Documentation can also clearly be treated separate
>> if need be. However, given this, then the module as defined above is
>> longer the node in the dependency graph. But that is probably just the
> Test and doc dependencies should certainly be tracked separately, if at all.
>> Lib Dependencies:
>> Modules that are not header only have source files in the src directory
>> that are compiled into one or more library files (ignoring variants
>> directly supported by Boost.Build). Separate dependency graph nodes may
>> be appropriate here to distinguish dependencies at link and compile
>> time. But there are many possible facets of this, so I think the real
>> use-cases for the dependency graph should drive requirements for what
>> the nodes and edges shall model. In addition dependencies may vary on
>> configuration of the target environment. It is not clear if or how such
>> external dependencies should be tracked, however starting with the
>> Jamfile lib dependencies is certainly a good start. It may be most
>> package management systems has what is needed for the rest, so it is a
>> mater of bridging these worlds.
> I should think dependencies would be computed at the logical grouping
> represented by library or module, depending on what those terms actually
Yes I do agree with that, I was just trying to point out some addiitonal
potenital aspects. I was not saying we needed to care about them if
they are not needed. Module has that role as in modularization.
> I presume one will choose to build components by such logical entities.
Maybe, but we need to define "component" and what that means if we are
going to use it. Actually to me, with regard to boost, component is
more or less synonym with module. Maybe components are more about how
they are deployed and re-used, and module is more about the separation
of the components sources from the sources of other components or
modules in the boost source tree. But there are clearly alternative
definitions of component. Nevertheless, I am not sure we need both
component and module in the boost terminology dictionary, so I opted for
module as it has been used more than component in discussions and it
sort of fits with modularization.
>> Include Dependencies:
>> Dependencies in the include directory may cause compile and link time
>> dependencies for the module user. These dependencies does not incur
>> before a header is included directly or indirectly that require the
>> specific dependency to be met. This could, as some have pointed out,
>> be leveraged to get very flexible and fine-grained "real" dependency
>> graph in boost. However, as the actual dependencies are not known
>> before the application developer changes source code, compiles and
>> links, and then understand cause of the resulting diagnostics, this is
>> not very helpful for packaging of minimum required sub-sets of boost. I
>> am also afraid the diagnostics for missing headers or object file
>> symbols will not be a very user friendly solution. However if that
>> could be fixed somehow to point directly at the missing package, or
>> better that a package manager could be more or less automatically
>> invoked to fix it, then this may be a path forward. Such fine-grained
>> dependency tracking could greatly reduce need for sub-libraries.
> I agree that such fine-gained tracking can be a cause of confusion
> and hassles. I normally prefer to think in terms of libraries, not
> optional features. That does less to problems managing dependencies
> like Date Time's optional dependency on Serialization, however.
>> Separating larger chunks of code in a sub-library may seem reasonable
>> for several reasons, but to separate single headers into their own
>> sub-library only to get a "pretty" graph may clearly be way off the
>> reasonableness scale. Especially, if it can be reasoned that we don't
>> push internal boost structure problems on the helpless application
>> developer to figure out. It seems reasonable to look for facilitation
>> for something much simpler in these cases.
>> For the lack of a better
>> term for what some are suggesting, I just invented bridging-header as a
>> term which may be a mechanism to help in this situations.
>> Bridging Headers:
>> A bridging header is a C++ header files that bridges facilities in one
>> module with facilities in another module to provide a new convenience
>> facility to users. The bridging header is part of the include
>> in one of the two modules and only depend on a minimal required set of
>> features from the two modules to provide the new convenience facility.
>> A bridging header is marked in a to-be-determined way that allow
>> dependency tracking tools to track the set of bridging headers between
>> any two modules as a separate node (a bridge) in the dependency graph.
>> When a user include a bridging header it add both the bridged modules
>> dependencies, however it may not be practical to have every bridge
>> tracked by a package manager as a separate package.
> That seems like a decent approach.
The main challenge may be that it does not fit well with the dependency
tracking model used by many package managers. However, as it has been
pointed out in the discussions, any reasonable use of a bridging header
would be in an environment where the other package would installed, even
if not by enforcement of package-manager dependency rules. E.g.:
DateTime and Serialization packages would be naturally installed by a
user before any attempt to serialize DateTime data types. So it may not
be a big deal if there is no DateTimeSerialization package in addition,
it seems almost like the extra package would just create friction here
as users would have to discover it to be aware of of the bridging
facilities. The bridge may just as well be part of the serialization
package or the DateTime package without enforcing installation of the
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk