Boost logo

Boost-Build :

From: Jurko Gospodnetić (jurko.gospodnetic_at_[hidden])
Date: 2008-01-09 19:33:22


   Hi Rene.

> OK. But I think what you didn't realize is that the BBv2 code must be
> able to work with various bjam versions. Not only does Boost regression
> testing function this way. But it is very common for users to mix and
> match bjam and BBv2, for the plain reason that they tend to install
> system provided bjam's which is usually older version than the src
> included with BBv2.

   Ouch, ok. :-) I assumed trunk testers used the trunk version, release
testers used the release version and users the version coming bundled
with the Boost distribution they want to build...

>>> It is too late to change this behavior. We have releases that rely on it
>>> and would likely break in a variety of ways.
>> Ouch, this does not feel right. Does not have tests, can not be used
>> for the purpose I read it should be used for (testing action output) and
>> can not be changed... :-(
>
> I'm not sure I understand your comment.

   Sorry, I meant that in some newsgroup messages I read your reply that
the __ACTION_RULE__ can be used to access that output should it be
needed (I believe related to the change that made quiet actions not
display their action output). And see below for an example of what I
meant when I said that it can not be used for that, at least not in a
robust manner...

> It's designed to faithfully
> capture the output of the actions run. Nothing more.

   Ok, if that is how you defined them then they do work correctly -
they do give you the exact byte array you received on output/error
streams from the executed child processes. But that does not make the
string usable in a Jam variable as I explain below...

>> Could you at least add tests demonstrating the behavior you need?
>
> I'll try to add some bjam tests. Sorry the tests for bjam are slim but I
> only recently started on adding those. But I thought the behavior was
> obvious for __ACTION_RULE__ and __TIMING_RULE__ (and SYSTEM/SHELL)
> especially since they are all documented.

   Ok, if you ignore the line-endings problem then it is 'obvious'.
However I believe you will run into problems when you attempt to prepare
any sort of a test case when action output contains newlines on Windows.
See below for more details.

>> Btw. the problem with the original/reverted implementation is that
>> the output string contains newline sequences encoded in the way that the
>> action encoded them - most likely \n or \r\n based on the host platform
>> and/or standard C library.
>
> It's not based on the C library, or strictly speaking the host platform.
> It is a precise reproduction of the output produced by the programs run
> in actions.

   What I meant there was 'what you get is the EXACT output the executed
command produced, including all newline sequences encoded in the EXACT
way the executed command encoded them - most likely \n or \r\n based on
the host platform and/or standard library that command used'. Meaning
that if the action executes 'echo Rene' on Windows you'll get 'Rene\r\n'.

>> Jam, then can not handle such strings
>> correctly as for example: MATCH rule does not handle \n intuitively in
>> its regex format (internal regex implementation treats it as an or).
>
> There are many examples in code where we handle MATCH and newlines (and
> just about any Jam code dealing with newlines) by always using the
> platform newline with something like:
>
> NL = "
> " ;

   I do not believe this is true. Grepping through the boost source tree
for 'MATCH.*nl' finds only one occurrence and that is the one in the
python.jam module's is-cygwin-symlink rule, and even that I'm not sure
works correctly but I have not taken the time to check. Again, for a
more detailed example see below...

>> E.g. If you have an action displaying the text: 'Something' (minus
>> the quotes) and a newline on *NIX it would get output (and collected via
>> a pipe by bjam) as 'Something\n' and on Windows as 'Something\r\n'. Then
>> if you attempt pass it to the registered __ACTION_RULE__ you get that
>> expanded value in the $(output) variable. If you now try to output that
>> value from bjam it gets output ok on *NIX but on Windows the \n gets
>> expanded again so you get 'Something\r\r\n'.
>
> AFAIK if you echo it from within bjam it doesn't re-expand the newlines.
> How/where is it output such that it's getting reparsed?

   I believe Jam handles newlines in the following manner:

   1. When it parses its input files it reads them as text files and
automatically converts native newline character sequences to \n characters.

   2. As a consequence of 1. all newlines in Jam variables defined
inside Jam scripts use only a single \n character to represent a newline
- no matter what platform you are on. I verified this by debugging the
Jam executable and seeing how it received parameters containing newlines
in built-in rules.

   3. When Jam does any sort of output (including the ECHO and EXIT
rules as well as file output using the :E= variable modifier) it treats
it as text output (most likely simply opens the output file as a text
file) and so any \n characters get expanded to the native OS newline
character sequence.

   That is why, if you somehow get a \r\n character combination into a
Jam variable, then outputting that variable will produce the \r\r\n
character combination. Such a combination would not be immediately
visible on console output (displays the same as \r\n) but try piping the
output to a file and looking at it with a hex editor. Note that writing
out this file to the console (e.g. using the type command) produces the
same output as Jam.

>> Some alternatives to my solution that could possibly better suit you:
>>
>> (1.) Update the command output passed to the __ACTION_RULE__ rule so
>> that all \r\n sequences get converted to \n? That would not solve the
>> MATCH problem but would solve the output problem.
>
> Not an option. I want the returned string to accurately reflect the
> content really produced. It would also cause a discrepancy with how we
> handle newlines everywhere else in bjam.

   Actually... it would not cause a discrepancy due to Jam always
handling all newlines as \n characters as noted before.

>> (2.) Update the failing tests that parse the action output so that
>> they specify the -d+1 option and expect to see action names besides
>> their output. This is the 'easy way' to fix these tests but one which I
>> have not done only because they pointed me to __ACTION_RULE__ problems.
>>
>> Any other suggestions?
>
> (3) Normalize the __ACTION_RULE__ output as needed where you are
> processing it. I'm guessing this is a problem with Python reparsing the
> output. Hence fixing it on that end seems more appropriate.

   Nope, this has nothing to do with Python. This is output produced by
Jam and can be reproduced without using Python at all. I was a bit
paranoid after you said that this was the case and wanted to make sure
so I prepared a separate program collecting Jam's output via a pipe and
got the same results. For an example of how to demonstrate these
problems see below...

   I'm also guessing that you see the same \r\r\n newline character
sequences in the Boost Build XML files used for regression testing when
generated on Windows but most likely those get gobbled up correctly by
something ends in the tool-chain.

> There are a few variations on the theme of post-processing the output of
> __ACTION_RULE__ that can apply. But I don't know enough abut how you
> want to use it to suggest something more specific.

   Hmmm, I can't think of a way to, for example, split the current
__ACTION_RULE__ provided output to lines from a Jam script.

   One possible hack would be to assume that on Windows the newlines are
surely represented as \r\n, which does not have to be the case - imagine
automating any cygwin utility from your action as most of them use *NIX
style \n newline characters. In that case you could match up to the
first newline and one (any) character before it.

   Another would be to provide Jam scripts with a way to define a \r
character which would then allow the script to match for \r\n directly.

>> Ok, in conclusion, I'll give up on the __ACTION_RULE__ changes,
>> correct the failing tests as described above under alternative (2.) and
>> prepare a separate test demonstrating the __ACTION_RULE__ problem which
>> you can then accept or discard...
>
> OK.

   Tests have been corrected and committed and I'm attaching here a
'manual test' demonstrating different behavior I mentioned in this mail.
This is the code I'll convert into one or more tests when I get the time
next. Currently I am not really sure what the correct behavior should be
and the only actual automated test I can think of is the one making sure
that the output collected from the OS's echo command matches the same
output with an appended newline in a Jam script. Any
comments/suggestions would be welcome...

   The code demonstrates the following (describing the output, run the
test and it'll be obvious what the line numbers mean):

     * First lines 1. - 3. show you that Jam holds only a single
character as a newline character sequence in its Jam variables like the
NL variable you mentioned.

     * Then lines 4.-10. and 11.-17. show that match treats newline
characters the same as '|' characters in its regular expressions.

     * Lines 18.-24. and 25.-31. then show that one could actual match
for a newline by escaping it or placing it inside square brackets [].

     * Now the script runs an actions that runs three OS echo commands
displaying XXX, YYY, ZZZ each in its own line and separated using native
newline character sequences.

     * Then the __ACTION_RULE__ runs and displays all the data available
to it. If you pipe this part to a file and view it with a hex editor
you'll notice that the newline character sequences used for the output
data are \r\r\n (most regular editors will display this as an extra
empty line, console output however does not as \r on it just jumps to
the start of the current line).

     * Lines 32.-36. then use matches to show you that the newline
character sequences in the __ACTION_RULE__ provided output contain two
characters unlike those seen earlier in the Jam variables such as NL.

     * And finally, lines 37.-41. show you what happens if you try to
simply match the __ACTION_RULE__ provided output using the only newline
sequence that a Jam script can construct - a simple \n.

   I believe all of this works just fine on *NIX though. :-)

   Also, note that since I guess no one actually uses this and all the
tests bringing it up have already been fixed in a different way, this
mainly becomes 'something that was interesting to research while I was
trying to get friendly with Boost Build internals'. :-) It could even be
solved by defining that the __ACTION_RULE__ provided output is not meant
to be parser inside Jam scripts.

   Hope this helps.

   Best regards,
     Jurko Gospodnetić


# Disable the 'default toolset' warning.
import feature ;
feature.extend toolset : nonExistantToolset ;


################################################################################
#
# Utility rules.
#
################################################################################

rule display-match ( index : re : string )
{
    local match-result = [ MATCH $(re) : $(string) ] ;
    echo $(index).: /$(match-result)/ ;
}
IMPORT $(__name__) : display-match : : display-match ;


################################################################################
#
# This part demonstrates problems MATCHing strings containing newline
# sequences.
#
################################################################################

rule test-newline-matching ( )
{
    local nl = "
" ;

    local original = "XXX
YYY
ZZZ" ;

    ECHO "------------------" ;
    display-match 1 : "(YYY)" : $(original) ;
    display-match 2 : "(YYY.)" : $(original) ;
    display-match 3 : "(YYY..)" : $(original) ;
    ECHO "------------------" ;
    display-match 4 : "(YYY$(nl))" : $(original) ;
    display-match 5 : "(YYY$(nl)#)" : $(original) ;
    display-match 6 : "(YYY$(nl)X)" : $(original) ;
    display-match 7 : "(YYY$(nl)XXX)" : $(original) ;
    display-match 8 : "(YYY$(nl)Z)" : $(original) ;
    display-match 9 : "(YYY$(nl)ZZZ)" : $(original) ;
    display-match 10 : "(###$(nl)ZZZ)" : $(original) ;
    ECHO "------------------" ;
    display-match 11 : "(YYY|)" : $(original) ;
    display-match 12 : "(YYY|#)" : $(original) ;
    display-match 13 : "(YYY|X)" : $(original) ;
    display-match 14 : "(YYY|XXX)" : $(original) ;
    display-match 15 : "(YYY|Z)" : $(original) ;
    display-match 16 : "(YYY|ZZZ)" : $(original) ;
    display-match 17 : "(###|ZZZ)" : $(original) ;
    ECHO "------------------" ;
    display-match 18 : "(YYY[$(nl)])" : $(original) ;
    display-match 19 : "(YYY[$(nl)]#)" : $(original) ;
    display-match 20 : "(YYY[$(nl)]X)" : $(original) ;
    display-match 21 : "(YYY[$(nl)]XXX)" : $(original) ;
    display-match 22 : "(YYY[$(nl)]Z)" : $(original) ;
    display-match 23 : "(YYY[$(nl)]ZZZ)" : $(original) ;
    display-match 24 : "(###[$(nl)]ZZZ)" : $(original) ;
    ECHO "------------------" ;
    display-match 25 : "(YYY\\$(nl))" : $(original) ;
    display-match 26 : "(YYY\\$(nl)#)" : $(original) ;
    display-match 27 : "(YYY\\$(nl)X)" : $(original) ;
    display-match 28 : "(YYY\\$(nl)XXX)" : $(original) ;
    display-match 29 : "(YYY\\$(nl)Z)" : $(original) ;
    display-match 30 : "(YYY\\$(nl)ZZZ)" : $(original) ;
    display-match 31 : "(###\\$(nl)ZZZ)" : $(original) ;
    ECHO "------------------" ;
}


################################################################################
#
# This part demonstrates that __ACTION_RULE__ receives the action output
# together with unprocessed raw newline character sequences. This will not work
# unless you run the test on an OS like Windows that uses a multi-character
# newline sequence.
#
# When \r\n newline sequences get output the \n character will get re-expanded
# again and the output will contain \r\r\n sequences. You can check this out by
# piping the output into a file and viewing it in any hex editor.
#
################################################################################

rule test-output-display ( )
{
    import notfile ;

    module
    {
        rule display-action-info ( args * : xml-file : command status start end user system : output ? )
        {
            ECHO "targets :" /$(targets)/ ;
            ECHO "args :" /$(args)/ ;
            ECHO "xml-file:" /$(xml-file)/ ;
            ECHO "command :" /$(command)/ ;
            ECHO "status :" /$(status)/ ;
            ECHO "start :" /$(start)/ ;
            ECHO "end :" /$(end)/ ;
            ECHO "user :" /$(user)/ ;
            ECHO "system :" /$(system)/ ;
            ECHO "output :" /$(output)/ ;

            local nl = "
" ;

            # Lets try matching the output a bit to show that we have multi-line
            # character sequences even though this is not the case with ordinary
            # Jam script variables constructed with newlines in them.
            ECHO "-----------------------------------------------------------" ;
            display-match 32 : "(YYY)" : $(output) ;
            display-match 33 : "(YYY.)" : $(output) ;
            display-match 34 : "(YYY..)" : $(output) ;
            display-match 35 : "(YYY...)" : $(output) ;
            display-match 36 : "(YYY....)" : $(output) ;
            ECHO "-----------------------------------------------------------" ;
            display-match 37 : "(YYY)" : $(output) ;
            display-match 38 : "(YYY\\$(nl))" : $(output) ;
            display-match 39 : "(YYY\\$(nl).)" : $(output) ;
            display-match 40 : "(YYY.\\$(nl).)" : $(output) ;
            display-match 41 : "(YYY.\\$(nl)..)" : $(output) ;
            ECHO "-----------------------------------------------------------" ;
        }
    }

    rule build-command ( targets * : sources * : properties * )
    {
        __ACTION_RULE__ on $(targets) = display-action-info ;
    }

    actions build-command
    {
        echo XXX
        echo YYY
        echo ZZZ
    }

    notfile testTarget : @build-command ;
}


################################################################################
#
# main()
# ------
#
################################################################################

test-newline-matching ;
test-output-display ;


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk