Boost logo

Boost :

Subject: Re: [boost] [general] What willstringhandling inC++ looklike inthe future [was Always treat ... ]
From: Peter Dimov (pdimov_at_[hidden])
Date: 2011-01-19 23:43:48


Dave Abrahams wrote:
> IIUC, you're talking about changing the abstraction presented by
> std::string to "sequence of individually addressable and mutable chars
> that by convention represents text encoded as utf-8."

Something like that. string is just char[] with value semantics. It doesn't
necessarily hold a valid UTF-8 sequence.

> I would prefer to be handling something that presents the abstraction
> "character string." I'm not sure exactly what that looks like, but
> I'm pretty sure the "individually addressable and mutable chars" part
> should go. I'd like to see an interface that prevents corrupting the
> underlying data such that it no longer represents a valid sequence of
> characters (or at least makes it highly unlikely that such corruption
> could happen accidentally). Furthermore, there are lots of string-y
> things I'd want to do that aren't provided—or aren't provided well—by
> std::string, e.g. if (s1.starts_with(s2)) {...}
>
> Does this make more sense?

It makes sense in the abstract. But there is no way to protect against
corruption without also setting an invariant that the sequence is not
corrupted (represents valid UTF-8), and I don't usually need such a string
in the interfaces we're discussing, although it can certainly be useful on
its own. The interfaces that talk to the OS need to be able to carry
arbitrary char sequences (in the POSIX case). Even an interface that
displays the string, one that by necessity must interpret it as UTF-8,
should preferably handle invalid UTF-8 and display some placeholders instead
of the invalid subsequence - it's better for the user to see parts of the
string than nothing at all. It's even worse to abort the whole operation
with an invalid_utf8 exception.

I don't particularly like string's mutable chars, but they don't mutate
themselves without my telling them to, so things tend to work out fine. :-)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk