Boost logo

Boost-Build :

Subject: [Boost-build] Upgrading the lexer
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2018-01-13 22:57:27


AMDG

I've implemented a new lexer that handles whitespace
more intelligently.
See https://github.com/boostorg/build/tree/scanner-upgrade

Example:
import testing;
rule mytest(sources*:requirements*)
{
  sources+=[glob x.cpp];
  requirements+=<link>shared:<define>YY;
  run $(sources):::$(requirements);
}
mytest test.cpp;

Details:

The following symbols are always their own tokens
when not quoted or escaped:
'{', '}', ';'

These symbols are independent tokens in contexts where
the grammar allows them:
'<', '>', '>=', '<=', '=', '[', ']', '*', '+', '?', '+=', '?=', ':'

Spaces will not break tokens inside variables expansion
like $(x:J= ). This is not a breaking change because it
currently causes a hard error.

In order to reduce the amount of breakage I've
also added the following special rules:
- A ':' is not a keyword when it appears in a token
  which appears to be either a conditional property
  like <link>shared:<define>X_DLL or a windows absolute
  path like C:\\Users
- A '>' is not a keyword if it closes a matching '<',
  to allow uses like:
  if <link>shared in $(properties)

The majority of issues appear in regular expressions
which must be quoted in most cases:
WRONG: [ MATCH ([.]) : $(x) ]
RIGHT: [ MATCH "([.])" : $(x) ]
I don't want to work around this, because it's too
ambiguous and unlike conditional properties it
appears relatively rarely in Jamfiles.

This is a major breaking change, so I'm planning to
split into three steps:

Step 1. Issue a warning for all tokens that will be handled differently.
Step 2. Turn the warning into an error.
Step 3. Enable the new lexer.

The scanner-upgrade branch is currently set to step 1.

Thoughts?

In Christ,
Steven Watanabe


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk