Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-16 21:12:07


On Sun, 16 Jan 2011 20:10:57 +0100
Robert Kawulak <robert.kawulak_at_[hidden]> wrote:

>> From: Chad Nelson
>> http://www.oakcircle.com/toolkit.html
>>
>> I've released it under the Boost license, so anyone may use it as
>> they wish.
>
> A very nice and useful utility. Anyway, I'll share some comments, just
> in case you want to hear some. ;-)

I'm always interested in comments -- thanks!

> "Be warned, if you try to convert a UTF-coded value to ASCII, each
> decoded character must fit into an unsigned eight-bit type. If it
> doesn't, the library will throw an \c oakcircle::unicode::will_not_fit
> exception."
>
> I think that exception is not always appropriate. A better solution
> would be a policy-based class design or additional conversion function
> accepting an error policy. This way the user could tell the converter
> to use some "similarly looking" or "invalid" character instead of
> throwing when exact conversion is not possible.

And if I were going to submit it for review, that's exactly what I'd
want too. That code was written solely for my own use, or other
programmers working with my company's code later, despite how the
documentation makes it look.

> "Note that, like pointers, they can hold a null value as well, created
> by passing \c boost::none to the type's contructor or setting it equal
> to that value."
>
> I don't feel the interface with pointer semantics is the most suitable
> here. Are there any practical advantages from being able to have a
> null string?

Nope. That's there solely so that certain functions can use it to return
an error value, using the same semantics as Boost.Optional, without
explicitly wrapping it in a Boost.Optional. If I were going to submit
it for review, I'd probably remove that completely.

> Even if so, one could use an actual pointer or boost::optional anyway.

I did use Boost.Optional at first, but for my code, I found it easier to
built that into the classes.

> Moreover, it would be nice if the proper encoding of the underlying
> string was the classes' invariant. Currently the classes cannot
> guarantee this because they allow for direct access to the value which
> may be freely changed by the user with no respect to the encoding.

As I said, this was written solely for my company's code. I know how to
ensure that changes to the internal data are consistent with the type,
and the design ensures that doing so is awkward enough to make people
scrutinize the code doing it carefully, so a code-review should catch
any problems easily. But again, if I were to submit it to Boost, I'd
likely change that first.

I'd also want to add full string emulation. Right now it only partly
emulates a string, and for any real work you're likely to need to
access the internal data.

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk