Boost logo

Boost-Build :

From: Samuel Krempp (krempp_at_[hidden])
Date: 2004-01-22 16:58:13


At this point a jam file can contain any characters in variables, rule names
etc.. except for a few exclusions ('<' , ..), right ?

Was that a design choice, or was defining restricted valid charsets never
even considered ?

I got hit in the foot today because I copy-pasted an unbreakable space char
from my news-reader to emacs, and bjam didnt understand it to be a space
nor warned me of the presence of a non-ascii char.

details : this char has value 0xA0, and is probably automatically used by
knode instead of regular spaces around ':' in my french locale iso-8859-15.
so the following line :
path-constant TAMERE<A0>: /tmp ;
looks in emacs just like the 0xA0 were a space, but then
bjam thinks the ':' is part of the name, and says :

*** argument error
* rule path-constant ( name : value )
* called with: ( TAMERE : /tmp )
* extra argument /tmp
--------

and you're completely mystified.
(the trick is that this line :
* called with: ( TAMERE : /tmp )
really is
* called with: ( TAMERE<A0>: /tmp )
but of course you just see a space instead of the 0xA0)

So well, I think bjam should either use locale when parsing, or impose some
charset restrictions.

I often whished I could use french accents in my C++ variable names (because
"initialisé" means "initialised", while "initialise" is the verb conjugated
at the 3rd person, so you can not make good french names for your functions
in this kind of situation whithout the accent. And this situation happens
all the time - for all "first group" verbs), So it pains me to be the one
to suggest banning non-ascii characters from jam files..

Maybe ban/warn all characters that are isspace() in the current locale.
But then maybe some locale have several variants of some other
special-meaning character, like : or ; , and then isspace wont save us. but
I think the locale provides comparison functions that could do the job.

Well, for now, why not just make bjam rejects (or at least warn) any
non-ascii char by default, and provide an option to disable that ?

ideally, this option would have 3 settings :
1. ban all non ascii
and 1' : warn instead of ban
2. ban characters that are found equal to any of bjam-special char in the
current locale (saves the user from hidden traps)
and 2' - just warn
3. dont restrict the charset in any way.

having {1, 3} would already be very useful, and enough for most cases.
{3} alone is too dangerous on any system where several char values look
alike (be it space, : or ; or whatever important char)
{1} alone is not nice for localised peoples.

-- 
Samuel
 

Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk