Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-19 21:44:44

On Wed, 19 Jan 2011 14:32:31 -0800
Mostafa <mostafa_working_away_at_[hidden]> wrote:

>> operator* has a long history of providing the contents of a variable,
>> even in C, and is a lot less typing to boot. But if you have any
>> technical arguments against it, I'm listening.
> Can we stick to std::string conventions as closely as possible? It
> makes using whatever new string library that much easier, and clearer,
> and maintainable.

Is there a conventional way to get the data stored in an std::string? ;-)

> From usage, it's not readily apparent what operator* is supposed to
> do in the context of strings, ie,
> utf8_t myStr(...);
> some_api_foo(*myStr);
> Even if I'm an experienced programmer, but a newbie to whatever
> library makes use of some_api_foo, I would be scratching my head at
> "*myStr"; and I would be forced to look up utf8_t::operator* or
> some_api_foo to figure it out.

I'd lean toward encoded(), or at least coded(), if operator* is out. If
you know anything about UTF-8, it's sufficiently descriptive. If you
don't, then nothing that's short enough to type on a regular basis is
going to eliminate the need for documentation.

> What about:
> utf8_t::cu_str
> where the last one stands for code-unit string.

If we need a code-point iterator, using anything based on the name
code-unit might be confusingly similar to anyone not already very
familiar with Unicode.

> I'm a big fan of conveying your intent in code. For the same reason
> I strong disagree with utf8_t::str. utf8_t is already a string class,
> and a generic sounding "str" method off it doesn't convey what kind of
> string it returns.

While that's true (and I'm not a fan of str() in this context either),
it does have the advantage of implying that it returns an std::string,
based on the conventions of std::stringstream.

Chad Nelson
Oak Circle Software, Inc.

Boost list run by bdawes at, gregod at, cpdaniel at, john at