Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2005-07-31 04:53:30


> I think there is a problem in the regex standardization proposal
> regarding the begin- and end-of-line assertions (^ and $). It seems that
> there is no way to customize their behavior via the traits. This is
> inconsistent with the word boundary assertion, which is implementable in
> terms of the word character class (\w). The author of a traits class
> should be able to specify which characters are line separators.

Understood.

> A simple fix would be to add a character class for line separator
> characters. Then, ^ and $ could be implemented in terms of
> lookup_classname and isctype, just as the word boundary assertion is.
>
> This leaves out an important corner case, though: \r is a line separator
> only if it is not immediately followed by a \n. I haven't yet come up
> with a traits interface that clean enough and general enough to satisfy.
> I'm open to suggestions.

The corner case you mention is too important to leave out IMO, in Boost-1.33
line boundaries will follow the Unicode recommendations:

http://www.unicode.org/reports/tr18/#Line_Boundaries

Are there any situations that this does not handle?

BTW, it's possible to go on adding "customisation points" to the traits
class almost indefinitely - but you have to draw the line somewhere - at the
moment I'm not sure which side of the line this one falls on :-)

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk