Boost logo

Boost :

Subject: Re: [boost] Review Request: Introduction of boost::string namespaceand string-conversion functions
From: Scott McMurray (me22.ca+boost_at_[hidden])
Date: 2009-02-14 01:50:03


On Fri, Feb 13, 2009 at 22:20, Vladimir Batov <batov_at_[hidden]> wrote:
>
> Having a few discussions between us :-) I'd honestly have expected a bit
> more from you. Essentially your email states "what you are doing is wrong...
> so wrong". Do you think it lifts my spirit to keep doing that or at least it
> helps me to find the "right" way to do that? Who needs lectures without
> suggestions?
>

To answer a rhetorical question with a rhetorical question, does it
lift someones spirits to hear "Well, 5 people didn't explicitly
reiterate your concern, so it must not be important"?

A dozen-email exchange without a response from the thread author does
not "People largely seem content [so] we are settling on
boost::string" imply. It would have been easy to use "I'll stick
with" instead of "we are settling on" to sound less domineering.

I'd like to think that those discussions between us are evidence that
each dissenting opinion should (ideally) be argued through to the end,
as it may lead to important observations.

It feels like you would like my comments on the thread, so I'll
provide that. I didn't earlier as my objection was to form rather
than technical aspects. Apologies if I re-tread ground unnecessarily.

- Process

>From the start, this thread has irked me, as including "review
request" in the subject seems to flout the normal submission
process[1] for no apparent reason. You've mentioned having used a
similar component many times, yet provided no "Preliminary
submission". You appear to be holding the "Discuss, refine, resubmit.
Repeat until satisfied" processes after the formal request, instead of
before. One or the other could be "not as conventional", but both
seems excessive.

[1] http://www.boost.org/development/submissions.html

- Concept

You initially described the goal as follows:

"My immediate interest/proposal in that namespace would be the
string-conversion functionality. The basics of that functionality are
currently handled by lexical_cast but need extended and unambiguously
tagged as *string*-related conversions."

My recollection is that lexical_cast was fine (or at least mostly
acceptable) for conversion to strings, but conversion from strings was
a much harder problem. Boost already has a library for that problem:
Spirit. With Spirit2, in fact, my understanding is that it has a
consistent way to both directions as well. (Arguably serialization
does as well, but there any parsing is an implementation detail.)

This makes it feel like the problem is already mostly solved, if in
greater detail than this library actually calls for. That makes me
think of another similar library, Bimap. Matias could have invented
and implemented his own way of doing it, which may have provided the
base functionality more quickly, but instead built on MultiIndex,
ultimately getting a much more flexible library with less duplicated
effort.

The proposed library is essentially a very simple lexer and
pretty-printer, so it feels like the right way to build it is using
Spirit2 components in such a way that gradually increasing complexity
in the conversions provide a path to something that could be extracted
and used in a full spirit grammar, should requirements eventually
surpass what the helper library provides. (Orders that progress from
simple numeric IDs up to, many iterations later, a full DSL.)

Unfortunately, I don't know enough about spirit2 to provide an outline
of how this might look.

Perhaps it would be a good idea to step back and decide on the
boundaries for the functionality first. So far all I can tell is
"string conversions allowing default values on failure and not needing
default constructors". Does it allow formatting? Is this just for
scalars? If it's for more, can I give different formatting for
different elements of containers or structures? Where's the line
where it gives up and says "use spirit"? Some major implementation
issues will effect the interface, such as the choice of piggy-backing
off operator<< or using a different method, so that's also important.

- Names

Prepositions make great language keywords, but as they're neither
things, actions, nor descriptions, I hate the use of prepositions for
identifiers. To and From are thus both inappropriate.

Regardless, I find the current use of to/from confusing. to<int> and
from<int> read equivalently, so there's no insight there. I think
that in the context of the string namespace, the `to` function seems
like it would have to convert *to* the entity implied by the context.
Notably, std::bitset has a to_string function that gives the
std::string corresponding to that bitset -- the opposite direction to
the proposed use of `to`. Similarly, everyone that's used javascript,
java, or .net has been exposed to ToString.

I think, though, that the fact that a choice exists is just further
proof that a preposition and a noun to not make a good name for an
action; We want verbs. Spirit uses parse; Perhaps Spirit2 has already
found a good verb for the pretty printer?

The next in the big three is `is`. At least it's a verb, though I
think it's the one that conveys the least information of any of them.
To me, `to be` implies some sort of simultaneity, and all the uses of
it I've seen have reflected that. A string is a string, not an int.
Its contents may have a useful interpretation as an int, but that's a
separate issue. For example, LLVM has an "isa<T>(x)" function[2] that
performs a similar function to dynamic_cast<T*>(x)!=0. Similarly,
VB.NET has a "typeof x is T" operator[3] for the same functionality.
It also has a "x Is y" operator[4] for reference equality. All three
use a far stronger state of being than "well, I can to some work to
turn that into one of those".

[2] http://llvm.org/docs/ProgrammersManual.html#isa
[3] http://msdn.microsoft.com/en-us/library/ex56dsed.aspx
[4] http://msdn.microsoft.com/en-us/library/kb136x1y.aspx

can_parse_as<T> or similar seems like a plausible form for the name,
though I don't really like it.

I can't, however, think of a need to know this that wouldn't result in
me wanting the parsed value. Check-then-parse seems almost certain to
be wasteful -- particularly since I haven't seen any proposed code
snippets that mention conversions other than int <-> string.
(--syntax-only is of course useful in a compiler, but this library is
not the right tool for anything close to that level of job.)

[5] http://www.boost.org/doc/libs/1_38_0/libs/type_traits/doc/html/boost_typetraits/reference/is_convertible.html

(And that fact that I keep feeling obliged to put these names in
`backticks` for clarity is further evidence that the name should be
more explicit.)

I think thats enough from me for now,
~ Scott


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk