Boost logo

Boost :

Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2014-06-16 19:45:29


Andrey Semashev wrote:

> Thinking about it more, the requirement to build the dependency graph
> before the the download may require the cache to be available for any
> given git commit. This is probably a good reason to make the cache
> version controlled.
>
>> Perhaps it could be stored in a git
>> note [1].
>
> I'm not very familiar with git and don't know anything about git
> notes. Maybe they fit for this purpose. But my request would be that
> these notes are not required to be added by maintainers.
>
> [...]
>
>> A
>> slightly more advanced solution would be to have the handler download
>> only the dependency file associated with the release tag using git
>> archive [2], before cloning the entire module (that might not work
>> with git notes, though). For releases, the dependency information
>> could also simply be aggregated in the superproject archive.
>
> Ok. As I said, I specifically did not require any particular means for
> delivering metadata into the tool. If this is possible with git,
> provided that usability is satisfactory, I'm all for it.
>
> [...]
>
> I agree that the cache is a good idea, as long as it's just a cache.
> I'm just saying that its role is auxiliary and it should not be
> managed by developers. [...]

So how about this: we work with two files. For now, let's call them
conditional_deps.txt and deps_cache.txt. Both are optional and
versioned if present. The conditional_deps.txt contains only
toolset/platform annotations and is maintained by humans.

The deps_cache.txt contains only the "bare" header-level dependency
information and is never maintained or even supposed to be read by a
human (perhaps it could be hidden). A commit hook is provided that
module maintainers can opt to add to their module configuration to
have it generated automatically (this won't affect history or be slow;
see below). Libraries that don't have the cache can still be handled
"blindly", as you suggested. In release archives the cache is
(automatically) bundled with the superproject.

Would you find that agreeable?

> Do git notes affect history? If yes, it would be undesirable if
> libraries history is spammed with automated commits adding notes with
> dependency info.

They don't. You may consider a git note a piece of custom metadata
associated with a commit, although it works a bit differently under
the hood.

The same applies to a deps_cache.txt file: it is created as part of the
commit procedure and included with the same commit object. No
additional commits appear in history. The maintainer does not need to
do anything to make this happen except for installing the hook, once.

>
>> It is friendly to tell end users in advance what dependencies will be
>> installed, but that can be solved by other means. A very simple
>> solution would be to list the dependencies on the Boost website.
>
> That doesn't really work for obvious reasons: (a) the advertised
> dependencies will get out of sync with reality sooner or later

Of course they would be generated automatically (and that would only be
necessary for global releases).

> and (b)
> you can't realistically request users to consult the website when they
> are about to install a Boost library. The tool should provide that
> information.

Good point.

>
> It is possible that the tool is not able to do that, if the cache is
> not available for the given commit to be checked out. The tool should
> notify the user about this problem but still allow to download the
> necessary components "blindly", by parsing headers for dependencies.

I believe this shouldn't really be necessary because a commit hook
should be transparent to the maintainer and sufficient to ensure that
the cache always exists. But I agree that this would be a reasonable
fallback option.

>
> [...]
>
> Another alternative is to create a new git submodule to store the cache in.

I think that would be a bad idea. The cache should be directly coupled
to the commit. We must avoid rolling our own datastructures just to
match the right cache to the right commit.

>
>> The advantage of just storing a plain file in the module directory is
>> that it certainly works, even if you download an archive without git
>> history, and without a need to set up a new FTP server or other web
>> service. I would prefer to start there and investigate prettier
>> solutions later.
>
> We're discussing a mechanism that will require mass changes to the
> libraries and possibly the workflow.

No, I think it shouldn't. My intention is to provide a new layer of
convenience without shaking things up too much. It should make it
easier to introduce other, more transformative changes; not the other
way round.

> [...]
>
> [...] But it might be more difficult to build the
> cache in time for heads of branches; there will be some latency
> between the commit and its metadata.

If the cache is updated by a commit hook, this will not be true. The
cache will always be 100% up-to-date. Committing by itself will not
take notably longer than usual either, because in most cases only a
small number of headers will be affected and this information is
available to the commit hook. Even if the deps_cache.txt needs to be
re-generated entirely and the module is very large, it should take
less than a second. (*)

Cheers,
Julian

___________

(*) I just tried:

    $ cd PATH_TO/include/boost/math/
    $ time grep -r --include="*pp" "#include" . > ~/test.txt

and it took 87 ms. Disk access is order of magnitude slower than
in-memory file processing, so I expect this to be fairly
representative of single-module dependency detection even on older
computers.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk