Boost logo

Boost-Build :

From: David Abrahams (david.abrahams_at_[hidden])
Date: 2002-01-06 17:04:02


----- Original Message -----
From: "Matt Armstrong" <matt+dated+1010780889.ea82e9_at_[hidden]>

> I don't think I'll have much time to contribute. My interest in Jam
> is mainly rooted in my job, and I am nearing the end of my slotted
> time to fix the build system up. Soon I'll be doing other things.

Lucky you. Fortunately, my proposal doesn't neccessarily require much of a
time commitment on your part. ;-)

> Because of this, my primary interest is in keeping our local
> patches/fixes minimal and well documented (in code with #ifdefs and
> externally in a LOCAL_CHANGES.txt file), with the idea that this will
> ease future merges of the stock upstream code base.

The desire to keep your patches minimal may conflict with my proposal. Of
course, anything which merges improvements from multiple sources will
increase the scope of any single patch set. On the other hand, I'd be more
than willing to prepare #ifdefs for my changes if there was a clear payoff.
I think all of my changes are well-documented, but of course I'd be willing
to improve that if neccessary.

> Ironically, the fact that "stock jam" changes so infrequently is
> another argument for continuing in this vein. Any local changes we
> make will likely remain maintainable simply because upstream jam
> doesn't change frequently or radically. This makes it safe to base
> changes off stock jam.

Understood.

> My personal take is that if I were to spend significant time hacking
> on a build tool, it'd be a re-write of jam, not incremental changes to
> it. I'd dump Jam's rule language for something both more traditional
> and powerful, since I am almost constantly annoyed by both the
> limitation and quirks of the Jam language. I think there is a reason
> many of the newer build tools are written directly in popular
> scripting languages like Perl and Python -- when you need to do real
> work you're in a language that can handle it. I have been thinking
> something based on scheme, since a small scheme interpreter optimized
> for Jam (i.e. perhaps no garbage collection, all strings in a newstr
> style pool, etc.) would be pretty small. But, I won't be leading this
> kind of project, since I have other stuff I consider more interesting.

I agree that an approach like that would be superior in many ways (and I'd
probably use Python as a base, since I'm familiar with its source and I
think it makes for more generally accessible code), but:
a. The syntax of Jam rule invocations is good for users
b. a fork like this one is a lot to commit to

On the other hand, many of my patches are aimed at exactly the limitations
and quirks that are probably annoying you (well, the ones that annoy me,
anyway). The ones I consider bug fixes (like using a string "class" to avoid
buffer overruns) are mostly only documented in the code, but most of the new
language features are detailed here:
http://www.boost.org/tools/build/build_system.htm#core_extensions

Some of these are surprisingly powerful. For example, the module support
primitives can be used to implement classes.

Of course, it would be nice to have more data structures than simple lists
of strings, but with the right primitives you can get a lot of power out of
Jam's basic approach.

> I'll append my LOCAL_CHANGES.txt, which lists all our local change to
> jam in detail. Some of these I've already posted about:
> * New Builtin Rules
>
> ** PWD
>
> A new rule PWD returns the current working directory. Used like
> so:
>
> pwd = [ PWD ];
>
> This, together with some Jam logic, can be used to generate a
> fully qualified path name. Currently it is only used to fully
> qualify the tools/bin directry before changing the PATH.
>
> This option is controlled by the OPT_BUILTIN_PWD_EXT #define.

!! A Booster was about to implement this one !!
It's good to know you've done it already.

> ** MATCH
>
> A new rule MATCH does regexp matching on a string, returning the
> result as a list of matches. Used like so:
>
> matches = [ MATCH string : pattern ] ;
>
> matches[1] is the portion of 'string' that 'pattern' matched.
> matches[2], matches[3], etc. hold the portion of 'string' matched
> by parenthesized sub expressions within 'pattern'.

Oh, that works. Ours is slightly different, based on the FTJam version, as
you can see from the link above.

> The syntax of the pattern regexp is identical to that of the
> HDRSCAN variable, since this rule uses Jam's internal regexp
> engine.
>
> The initial purpose of this rule is to allow the implementation of
> a Split rule within Jam, so things like path names can be easily
> decomposed.

Our SUBST rule is used for the same purpose (and a few others). The cacheing
of compiled regexps turned out to make a big difference in the case of path
splitting, which could happen quite often.

> ** W32_GETREG
>
> Available only under WinNT (Win2k as well). Gets a value from the
> registry, like so:
>
> value = [ W32_GETREG list ] ;
>
> This is primarily so Jam can find the location of the Visual C++
> installation from the registry, which makes it a bit easier to get
> a build environment up and running. Otherwise, they would have to
> set the MSVCDIR environment variable, either at Visual C++ install
> time or by running the vcvars32.bat file that comes with Visual
> C++.
>
> This option is controlled by the OPT_BUILTIN_W32_GETREG_EXT
> #define.

Nice to have.

> ** W32_SHORTNAME
>
> Available only under WinNT (Win2k as well). Takes a string
> holding a file name and returns its short name. E.g. "Program
> Files" -> "PROGRA~1" etc. Used like so:
>
> short = [ W32_SHORTNAME longname ] ;
>
> This is primarily useful for shortening the long path name
> supplied by W32_GETREG, which often contains things like "Program
> Files" in it, etc., which confuses Jam later on.
>
> This option is controlled by the OPT_BUILTIN_W32_SHORTNAME_EXT
> #define.

Nice to have.

> * New Features
>
> ** Header Caching
>
> This code is taken from file://guest/craig_mcpheeters/jam/src/ on the
> Perforce public depot. Many thanks to Craig McPheeters for making his
> code available. It is delimited by the OPT_HEADER_CACHE_EXT #define
> within the code.

I definitely want this.

> Jam has a facility to scan source files for other files they might
> include. This code implements a cache of these scans, so the entire
> source tree need not be scanned each time jam is run. This brings the
> following benefits:
>
> - If a file would otherwise be scanned multiple times in a
> single jam run (because the same file is represented by
> multiple targets, perhaps each with a different grist), it
> will now be scanned only once. In this way, things are
> faster even if the cache file is not present when Jam is
> run.
>
> - If a cache entry is present in the cache file when Jam
> starts, and the file has not changed since the last time it
> was scanned, Jam will not bother to re-scan it. This
> markedly increaces Jam startup times for large projects.
>
> This code has improvements over Craig McPheeters' original
> version. I've described all of these changes to Craig and he
> intends to incorporate them back into his version. The changes
> are:
>
> - The actual name of the cache file is controlled by the
> HCACHEFILE Jam variable. If HCACHEFILE is left unset (the
> default), reading and writing of a cache file is not
> performed. The cache is always used internally regardless
> of HCACHEFILE, which helps when HDRGRIST causes the same
> file to be scanned multiple times.
>
> Setting LOCATE and SEARCH on the the HCACHEFILE works as
> well, so you can place anywhere on disk you like or even
> search for it in several directories. You may also set it
> in your environment to share it amongst all your projects.
>
> - The .jamdeps file is in a new format that allows binary data
> to be in any of the fields, in particular the file names.
> The original code would break if a file name contained the
> '@' or '\n' characters. The format is also versioned,
> allowing upgrades to automatically ignore old .jamdeps
> files. The format remains human readable. In addition,
> care has been taken to not add the entry into the header
> cache until the entire record has been successfully read from
> the file.
>
> - The cache stores the value of HDRPATTERN with each cache
> entry, and it is compared along with the file's date to
> determine if there is a cache hit. If the HDRPATTERN does
> not match, it is treated as a cache miss. This allows
> HDRPATTERN to change without worrying about stale cache
> entries. It also allows the same file to be scanned
> multiple times with different HDRPATTERN values.
>
> - Each cache entry is given an "age" which is the maximum
> number of times a given header cache entry can go unused
> before it is purged from the cache. This helps clean up old
> entries in the .jamdeps file when files move around or are
> removed from your project.
>
> You control the maximum age with the HCACHEMAXAGE variable.
> If set to 0, no cache aging is performed. Otherwise it is
> the number of times a jam must be run before an unused cache
> entry is purged. The default for HCACHEMAXAGE if left unset
> is 100.
>
> - Jambase itself is changed.
>
> SubDir now always sets HDRGRIST to $(SOURCE_GRIST) so header
> scanning can deal with multiple header files of the same
> name in different directories. With the header cache, this
> does no longer incurs a performance penalty -- a given file
> will still only be scanned once.
>
> The FGristSourceFiles rule is now just an alias for
> FGristFiles. Header files do not necessarily have global
> visibility, and the header cache eliminates any performance
> penalty this might otherwise incur.
>
> Because of all these improvements, the following claims can be
> made about this header cache implementation that can not be made
> about Craig McPheeters' original version.
>
> - The semantics of a Jam run will never be different because of
> the header cache (the HDRPATTERN check ensures this).
>
> - It will never be necessary to delete .jamdeps to fix obscure
> jam problems or purge old entries.

Given the MATCH rule, you could also make a distinction between #include
"..." and #include <...>, as is neccessary for some compilers. We are doing
that in Boost.Build.

> ** Exporting Jam variables to the environment using ENVEXPORT.
>
> This change causes the global value of the ENVEXPORT variable to
> take on special meaning. It becomes a list of Jam variables that
> are to be exported into the environment.
>
> For example, if ENVEXPORT is equal to the list FOO BAR BAZ, then
> the environment variables FOO, BAR, and BAZ will be set to
> whatever values the Jam global variables of the same name were set
> to.
>
> If a Jam global variable holds a list, the entire list is exported
> to the environment. When the variable's name ends with "PATH",
> "Path" or "path", then the list elements are concatenated together
> with the SPLITPATH character separating elements (SPLITPATH is ';'
> under Windows and ':' under Unix), otherwise the list elements are
> concatenated with a single space.
>
> By default, the value of ENVEXPORT is the empty list, so no
> environment variables are exported by default.
>
> This option is controlled by the OPT_ENVIRONMENT_EXPORT_EXT
> #define.

We do the same thing by embedding lines like:

$(SHELL_SET)$(VARNAME)=$(VALUES)
$(SHELL_EXPORT)$(VARNAME)

in our build actions, with SHELL_SET and SHELL_EXPORT set appropriately to
the plaform. Joining paths with a SPLITPATH character in Jam code is not
much of a burden. Is that too limiting for some reason?

Also, would your feature be used as follows?

ENVEXPORT on $(<) = PATH LD_LIBRARY_PATH ;
PATH on $(<) = ... ;

...such that the build actions on $(<) now run in an environment with PATH
and LD_LIBRARY_PATH taken from the corresponding on-target Jam variables?
The operation of ENVEXPORT is not clear from your text above.

> ** The :X variable expansion
>
> Expanding a variable with :X will change all \ chars in the
> variable to / chars.
>
> E.g.
>
> foo = "a\\b\\c"
> bar = $(foo:X)
> # bar is now "a/b/c"
>
> This is useful when dealing with cygwin tools that expect path
> elements to be unix style, I guess.
>
> FIXME: is this truly necessary? Or can it be solved in Jam?
> E.g. we might be able to use the Split rule to get around this.

It can often be solved in Jam, but not reliably. For example, binding will
cause backslashes to appear in paths on NT. I prefer the :\ and :/ syntax
someone else is using. It can actually be important to have backslashes in
some cases (e.g. MakeLocate).

> ** Human Readable Dependency Output
>
> This code is taken from from file://guest/craig_mcpheeters/jam/src/ on
the
> Perforce public depot. Many thanks to Craig McPheeters for making his
> code available. It is delimited by the OPT_GRAPH_DEBUG_EXT #define
> within the code.
>
> With this option, debug level 10 will print out the entire dependency
> tree in a form that is more easily understood than jam's debug level
> 6.

I've already grabbed this excellent work as well.

> ** Target Fate Change Debugging
>
> This code is taken from from file://guest/craig_mcpheeters/jam/src/ on
the
> Perforce public depot. Many thanks to Craig McPheeters for making his
> code available. It is delimited by the OPT_GRAPH_DEBUG_EXT #define
> within the code.
>
> With this option, debug level 11 prints out target fate changes as
> they occur (and why they occur). This helps debug mysterious "why
> is THAT file getting rebuilt" problems.
>
> ** Improved ...patience...
>
> This changes the ...patience... lines to be printed out after the
> first 100 and every subsequent 1000 files have been header scanned.
> Previously, ...patience... was printed out for every 1000 targets.
>
> This change both reduces the number of ...patience... lines printed,
> and makes them more accurately reflect the work being done.
>
> This change is enabled with the OPT_IMPROVED_PATIENCE_EXT #define.
>
> ** Improved debug level help
>
> This change is delimited by the OPT_IMPROVE_DEBUG_LEVEL_HELP_EXT
> #define within the code.
>
> The -h option to jam now prints out what each of the debug levels do.
>
> ** Print Total Time
>
> This change is delimited by the OPT_PRINT_TOTAL_TIME_EXT #define
> within the code.
>
> If the total time jam runs is greater than 10 seconds, the time is
> printed when jam exits. This helps people back up claims that the
> build is too slow and they need a faster machine. ;-)
>
> ** Improved HdrRule treatment
>
> A new 3rd argument to HdrRule is the bound name of the 1st
> argument to HdrRule. This allows HdrRule to extend the search
> path for headers to include all directories headers have been
> found in so far.
>
> E.g. if a source file does "#include <foo/bar/baz.h>" and the
> baz.h header is found in $(TOP)/include, this change allows
> HdrRule to add $(TOP)/include/foo/bar to the HDRSEARCH path. This
> way, if baz.h does #include "goo.h", any goo.h in
> $(TOP)/include/foo/bar will be found.
>
> The default Jambase makes use of this new argument to extend
> HDRSEARCH on the header files.
>
> This feature is enabled with the OPT_HDRRULE_BOUNDNAME_ARG_EXT
> #define.

Useful, though I've tried to implement things in terms of powerful
primitives, thus my "BINDRULE on..." feature which lets you do the same
thing.

> ** Improved "compile" debug output.
>
> With level 5 jam debugging, a jam rule execution trace is
> printed. This extends that debugging output to include:
>
> - when a new rule is defined (with a special note when the new
> rule re-defines a pre-existing rule).
>
> - when a new actions is defined (with a special note when the
> new actions re-defines a pre-existing actions).
>
> - when an included Jamfile ends.
>
> This makes it possible to write scripts that process Jam debugging
> output that look for potential errors, such as re-defining a rule
> or action that is part of Jambase.
>
> This feature is enabled with OPT_IMPROVE_DEBUG_COMPILE_EXT.

My -d+5 output prints the filename and line being executed in a format which
allows editors and tools to jump to that location. I am now able to "step
through" a Jam execution with emacs.

> ** "Lazy" targets
>
> Because the Windows NT shell (cmd.exe) sucks, it is often best to
> break up complex operations into many actions. Examples include
> creating various response files and linker definition files for
> the link step of a compile.
>
> The problem with this is that these files may not always be
> rebuilt when necessary. It is difficult to construct a
> straightforward chain of actions that guarantees that all the
> response files that need to get built whenever the final link
> makes use of them.
>
> Stock Jam provides two main ways to accomplish this:
>
> - Mark the response files TEMPORARY and remove them with
> RmTemps after the link. This is problematic since removing
> them just adds mystery to the final link process for the
> typical engineer. People often want to look at the files to
> see exactly how the link occurs.
>
> - Perform the final link with several actions that take a list
> of the final image and all the response files in $(<). Each
> action would build one of the elements in $(<). This is an
> obtuse hack that is difficult to explain and maintain.
>
> The solution presented here is a new built in rule LAZY. When
> called like this:
>
> LAZY target ;
>
> "target" is marked "lazy".
>
> When Jam decides that a given target is to be built, it now checks
> all direct dependents to see if they are marked lazy. If they
> are, the lazy dependents are also marked for rebuilding, and their
> direct dependents are similarly considered, and so on.
>
> This affords the benefits of marking targets TEMPORARY (that they
> will be rebuilt whenever the targets they depend on are rebuilt)
> without the negatives (that they get deleted after the build).
>
> BUGS:
>
> There is a bug in this implementation that I do not believe will
> lead to practical problems. Consider the following set of
> dependencies.
>
> d -> b* (d depends on b*)
> c -> b*
> b* -> a
>
> Consider b* to be marked "lazy". The current implementation will
> correctly rebuild b whenever either d or c is rebuilt. However,
> it does not guarantee that BOTH d and c get built whenever b* is
> updated. If b* is updated only because it is lazy, some of its
> dependents may not be updated. For example, if c is updated and
> b* is marked for updated because it is lazy, then d may not be
> updated. If d is marked for update and b* is marked for updated
> because it is lazy, c may not be marked for update. I call this a
> bug since it shouldn't be necessary to run Jam twice to satisfy
> all dependencies.
>
> A simple way to work around this is to mark b* with NOTFILE. This
> will cause b*'s time stamp to no longer be considered. This is
> arguably a reasonable thing to do, since these files are rarely
> edited by hand and whenever they are used they are rebuilt.
> Another workaround is to mark the final linked image with LEAF,
> which will usually has a similar effect of removing b*'s time
> stamp from consideration. Another workaround is to avoid having a
> LAZY file with more than one dependent target (this is usually the
> case anyway, which is the major reason I don't consider this
> problem serious).
>
> * Operational Changes
>
> ** Versioning
>
> We add a PATCHED_VERSION variable that indicates the local version
> of custom jam is in use.
>
> The variable is a list. PATCHED_VERSION[1] is the major version,
> PATCHED_VERSION[2] is the minor version.
>
> As you might expect, major version increments indicate
> non-backwards compatible changes (elimination of builtin rules or
> other features, changing features in an incompatible way, etc.).
> Minor version increments indicate the addition of backwards
> compatible features and bug fixes.
>
> It is expected that a project's Jamrules will check the
> PATCHED_VERSION variable and check for a major version mismatch,
> and ensure the minor version is not too low.
>
> This option is enabled with the OPT_PATCHED_VERSION_VAR_EXT
> #define.
>
> ** Maximum Command Length for NT
>
> Jam ships with a maximum command line length of 996 for Windows
> NT. Windows NT 4.0 and greater can handle command line lengths of
> at least 10240 characters long (perhaps longer, no tests have been
> done).
>
> This change increaces the maximum command line length to 10240 for
> Windows 4.0 and greater.
>
> Caveat: the default Windows 4.0 command shell can only handle
> commands up to 1-2k bytes long for many of its own internal
> commands, such as del and echo. So this feature has spurred the
> implementation of jamshell.c, a simple shell that lives in
> tools/jamshell.
>
> This option is enabled with the OPT_FIX_NT_BATCH_EXT #define.

Hmm. I just set maxline to 2047, since workarounds for line length
limitations are needed anyway.

> * Bug Fixes
>
> ** Windows NT Batch File Naming Bug
>
> This code is taken from from file://guest/craig_mcpheeters/jam/src/ on
> the Perforce public depot. Many thanks to Craig McPheeters for
> making his code available. It is delimited by the
> OPT_FIX_NT_BATCH_EXT #define within the code.
>
> Running jam multiple times on the same machine could break because
> jam's temp batch file names were of the form jamtmpXX.bat, where
> XX begins at 00 and increaces numerically.
>
> This fix adds the jam processes own PID to the temp batch file
> name, allowing multiple copies of jam.exe to run simultaneously
> without interfering with each other.

Hmm, I guess I'll need that fix.

> ** Improper handling of "on target" values during header scanning
>
> Setting any "on target" variables for $(<) within a HdrRule will
> actually set the global values for those variables and the "on
> target" values will remain unchanged.
>
> Why? Jam implements "on target" variables by swapping the current
> global values with the target specific values (see pushsettings()
> in rules.c) and then unswapping them when the target is no longer
> in scope (see popsettings() in rules.c).
>
> This works fine if the target variables are not changed between
> calls to pushsettings() and popsettings(). But, when scanning for
> header file dependencies, the HDRRULE is run, and so the "on
> target" variables of $(<) can be set.
>
> Doing so will actually cause the global value of the variable to
> be set. Why? Because the target's value will be swapped with the
> global value in the popsettings() call after the HdrRule is
> called. The value set on $(<) will either not change (if the same
> variable was previously set on the target), or be taken from the
> global setting (if the variable had never been set on the target
> before).
>
> This problem occurs with the default Jambase's HdrRule when any
> file includes itself. In this case, $(<) will also be present in
> $(>).
>
> This fix makes a copy of the target's "on target" variables and
> uses the copy with pushsettings() and popsettings() in make.c's
> make0() function. An alternate fix would be to freeze the "on
> target" variables of $(<) within a HdrRule, disalowing any
> modifications.
>
> This code is enabled with the OPT_FIX_TARGET_VARIABLES_EXT
> #define.

I know that I'll need to get this one.

I guess it would be a very good idea for Boost to prepare a comprehensive
document like this one, which includes both the bug fixes and the
enhancements. Thanks for discussing this with me,

Dave

 


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk