Boost logo

Boost :

From: Edward Diener (eddielee_at_[hidden])
Date: 2004-01-21 22:01:55


Luke Stebbing wrote:
> On Wed, 21 Jan 2004, "Daryle Walker" <darylew_at_[hidden]> wrote:
>> Are Boost.Regex objects and strings semantically equivalent?
>> ...
>> From what little I know of Regex, I think the answer is "no". (So
>> keep the constructor explicit.)
>
> I agree that from an interface perspective, that's the primary
> question, but I think the answer is "yes". Take a look at a string
> literal (char const* const literal, technically):
>
> "Hello world"
>
> Conceptually, this is a string. On the other hand:
>
> "He(\\w)+ world"
>
> Conceptually, this can be considered as either a string or a regular
> expression, depending on the context, and if a set of programmers were
> selected and shown that literal, I believe the overwhelming response
> would
> be "that's a regular expression".

I disagree with this. A regular expression is not a string, although a
string may be a regular expression. But because I say 'may', I am in favor
of an explicit constructor. An explicit constructor accentuates the fact
that one is choosing to regard a string as a basis for building a regular
expression.

>
> A careful reader will note that "Hello world" is also a regular
> expression, but a degenerate one (literal matching, no special
> characters) that we usually just call a string. The primary attribute
> that distinguishes regular expressions from stripped down const
> strings is where they are /used/, not what they are.

That and also the fact that regular expressions also have a set of rules for
determining how the string should be interpreted.

> This is
> emphasized by the fact that aside from locale/allocator information
> (which is associated with basic_string as well, but in a different
> way) and mark_count (which is an inspector function that gives
> information about parens grouping for a given regex), regex member
> functions form a proper subset of string member functions and are
> semantically identical.

A very small subset and an afterthought, I believe, to Dr. Maddock's
original regular expression implementation. Regularizing regular expressions
to use semantics more like strings and containers makes them easier to learn
to use and introduces prior art that has been found to be effective in
working with the C++ library.

>
> Context gives a regular expression meaning, and if the danger of
> expensive, accidental conversions is avoided (and I believe this is
> the case), implicit conversions should be allowed.

I agree with the first part but I don't with the second, not because of
accidents but because of concepts. A regular expression is a combination of
a string and a set of rules. A simple string is a very general purpose
mechanism while a regular expression is a much more specific concept. I
think it is right to force the programmer to explicitly want to treat a
string as the string part of a regular expression, rather than, like Perl,
to incorporate regular expressions into strings themselves and therefore
make it easy to regard a type of string as a regular expression. In other
words, I like the boundary line between one concept and another. Perhaps in
the future regular expressions will be specified via certain kinds of tokens
as classes, rather than a string, without suffering the concept of a regular
expression, as a pattern of significant tokens and literal character values,
to be lost. In that case, keeping the distinction between "string" and
"regular expression" will be even more important.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk